Collaborative Mapping - OMOPHub.com API

1. The “Silo Tax”

Every hospital that joins an OMOP network faces the same mapping wall. Their lab system uses “Cr_Serum” for creatinine. The one across town uses “CREAT_BLD.” The academic medical center uses “Creatinine_S_mg.” All three mean the same thing: LOINC 2160-0, Creatinine [Mass/volume] in Serum or Plasma. But each hospital maps it independently. Three data engineers, three weeks of work, three times the cost - for the same result. Scale this across 500 hospitals and 800 unique lab codes each, and you’re looking at hundreds of thousands of hours of duplicated mapping labor per year. This is the Silo Tax: the cost of every institution solving the same vocabulary problem in isolation. The OHDSI community has partial solutions. USAGI is an open-source mapping tool that helps data engineers find standard concept matches for local codes. Athena distributes the vocabularies themselves. The OMOP CDM has a source_to_concept_map table designed specifically to store local-to-standard mappings. But none of these provide a way to share completed mappings between institutions. OMOPHub doesn’t solve the sharing problem either - it doesn’t have a collaborative mapping repository. But it accelerates the creation of mapping files by providing fast vocabulary search without requiring a local Athena installation. And because mapping files follow a standard format (source_to_concept_map), they can be shared between institutions manually - via GitHub, shared drives, or project-specific data exchanges. This article shows the practical workflow: use OMOPHub to build mapping files fast, format them as source_to_concept_map records, and share them between sites participating in the same research network.

Honest scoping: OMOPHub does not currently have a shared mapping repository, user-contributed mappings, or collaborative mapping API. The workflow described here uses OMOPHub for vocabulary search + standard file formats for sharing. A future collaborative mapping feature would be valuable - but this article works with what exists today.

The OMOP CDM includes a table purpose-built for local-to-standard mappings:

source_to_concept_map:
  - source_code              (VARCHAR) - the local code, e.g., "Cr_Serum"
  - source_concept_id        (INTEGER) - 0 if no OMOP concept for the local code
  - source_vocabulary_id     (VARCHAR) - identifier for the local vocabulary, e.g., "HospitalA_Labs"
  - target_concept_id        (INTEGER) - the standard OMOP concept ID, e.g., 3016723
  - target_vocabulary_id     (VARCHAR) - e.g., "LOINC"
  - valid_start_date         (DATE)
  - valid_end_date           (DATE)
  - invalid_reason           (VARCHAR)

This table is the natural format for shareable mapping files. If Hospital A builds a source_to_concept_map for their lab codes and Hospital B has similar local codes, Hospital B can import Hospital A’s mappings as a starting point - reviewing and adjusting as needed. OMOPHub’s role: Populating the target_concept_id and target_vocabulary_id columns. For each local code, search OMOPHub to find the standard concept match, then write the result into the mapping table format.

3. Use Case A: Building a Shareable Mapping File with OMOPHub

A multi-site sepsis research project needs all participating hospitals to map their local lab codes for inflammatory markers to standard LOINC concepts. Instead of each hospital starting from scratch, the coordinating center builds an initial mapping file using OMOPHub, then distributes it.

pip install omophub

Python

import omophub
import json
from datetime import date

client = omophub.OMOPHub()

# Local codes from Hospital A's lab system (representative set for sepsis)
local_codes = [
    {"source_code": "CRP_HS", "display": "C-Reactive Protein, High Sensitivity", "vocab": "HospitalA_Labs"},
    {"source_code": "PCT_Level", "display": "Procalcitonin Level", "vocab": "HospitalA_Labs"},
    {"source_code": "Lactate_Art", "display": "Arterial Lactate", "vocab": "HospitalA_Labs"},
    {"source_code": "WBC_Auto", "display": "White Blood Cell Count, Automated", "vocab": "HospitalA_Labs"},
    {"source_code": "BldCx_Ana", "display": "Blood Culture Anaerobic", "vocab": "HospitalA_Labs"},
    {"source_code": "SepsisAlert", "display": "Sepsis Alert Triggered", "vocab": "HospitalA_Events"},
]

print("Building source_to_concept_map for sepsis project...\n")

mapping_records = []

for entry in local_codes:
    code = entry["source_code"]
    display = entry["display"]
    source_vocab = entry["vocab"]

    print(f"  {code}: '{display}'")

    target_id = 0  # Default: unmapped
    target_vocab = ""
    target_name = ""
    match_method = "unmapped"

    try:
        # Step 1: Search OMOPHub for the best LOINC/SNOMED match
        results = client.search.basic(
            display,
            vocabulary_ids=["LOINC", "SNOMED"],
            domain_ids=["Measurement", "Condition", "Observation"],
            page_size=3,
        )
        candidates = results.get("concepts", []) if results else []

        # Step 2: Semantic fallback if basic search misses
        if not candidates:
            semantic = client.search.semantic(display, vocabulary_ids=["LOINC", "SNOMED"], domain_ids=["Measurement", "Condition", "Observation"], page_size=3)
            candidates = (semantic.get("results", semantic.get("concepts", [])) if semantic else [])
            if candidates:
                match_method = "semantic"

        if candidates:
            best = candidates[0]
            target_id = best["concept_id"]
            target_vocab = best.get("vocabulary_id", "")
            target_name = best.get("concept_name", "")
            match_method = match_method if match_method == "fuzzy" else "basic"
            print(f"    -> {target_name} ({target_vocab}, OMOP: {target_id}) [{match_method}]")
        else:
            print(f"    -> NO MATCH - needs manual review")

    except omophub.APIError as e:
        print(f"    -> API error: {e.message}")
        match_method = "error"

    # Step 3: Build source_to_concept_map record
    mapping_records.append({
        "source_code": code,
        "source_concept_id": 0,
        "source_vocabulary_id": source_vocab,
        "source_code_description": display,
        "target_concept_id": target_id,
        "target_vocabulary_id": target_vocab,
        "valid_start_date": str(date.today()),
        "valid_end_date": "2099-12-31",
        "invalid_reason": None,
        # Extra metadata for review (not standard OMOP columns, but useful for sharing)
        "_match_method": match_method,
        "_target_concept_name": target_name,
        "_reviewed": False,
        "_reviewer": None,
    })

# Summary
mapped = sum(1 for r in mapping_records if r["target_concept_id"] != 0)
unmapped = len(mapping_records) - mapped
print(f"\n--- Mapping File Summary ---")
print(f"  Total: {len(mapping_records)}  |  Mapped: {mapped}  |  Needs review: {unmapped}")

# In production: save as CSV for sharing
# pd.DataFrame(mapping_records).to_csv("sepsis_source_to_concept_map_hospitalA.csv", index=False)

print(f"\n--- source_to_concept_map Records ---")
for r in mapping_records:
    status = f"-> {r['_target_concept_name']} ({r['target_vocabulary_id']}: {r['target_concept_id']})" if r["target_concept_id"] != 0 else "-> UNMAPPED"
    print(f"  {r['source_code']:15s} {status}  [{r['_match_method']}]")

The Key Insight: This script produces a source_to_concept_map-formatted file in minutes - work that would take days of manual USAGI review. The output is a CSV that can be shared with other hospitals in the network. Hospital B imports it, reviews the mappings (adjusting for their local code variants), and has a head start on their own mapping work.

4. Use Case B: Importing and Adapting a Shared Mapping File

Hospital B receives Hospital A’s mapping file. Their local codes are different (“CRP_Serum” instead of “CRP_HS”), but the target OMOP concepts are the same. They adapt the shared file and fill in the gaps.

Python

import omophub

client = omophub.OMOPHub()

# Hospital A's shared mapping file (loaded from CSV in production)
shared_mappings = {
    "C-Reactive Protein": {"target_concept_id": 3010156, "target_vocab": "LOINC"},
    "Procalcitonin": {"target_concept_id": 3046279, "target_vocab": "LOINC"},
    "Lactate": {"target_concept_id": 3047181, "target_vocab": "LOINC"},
    "White Blood Cell Count": {"target_concept_id": 3000905, "target_vocab": "LOINC"},
    "Blood Culture": {"target_concept_id": 3016407, "target_vocab": "LOINC"},
}

# Hospital B's local codes
hospital_b_codes = [
    {"source_code": "CRP_Serum", "display": "CRP Serum Level"},
    {"source_code": "PCT_Quant", "display": "Quantitative Procalcitonin"},
    {"source_code": "Lact_Venous", "display": "Venous Lactate"},
    {"source_code": "IL6_Level", "display": "Interleukin-6 Level"},  # Not in shared file
]

print("Adapting shared mapping file for Hospital B...\n")

for entry in hospital_b_codes:
    display = entry["display"]
    code = entry["source_code"]

    # Step 1: Check if any shared mapping matches by keyword overlap
    matched_shared = None
    for shared_key, shared_val in shared_mappings.items():
        if shared_key.lower() in display.lower() or any(
            word in display.lower() for word in shared_key.lower().split()
        ):
            matched_shared = (shared_key, shared_val)
            break

    if matched_shared:
        key, val = matched_shared
        print(f"  {code}: Reused shared mapping '{key}' -> OMOP {val['target_concept_id']} ({val['target_vocab']})")
    else:
        # Step 2: Not in shared file - look up via OMOPHub
        print(f"  {code}: Not in shared file. Searching OMOPHub...")
        try:
            results = client.search.basic(
                display,
                vocabulary_ids=["LOINC"],
                domain_ids=["Measurement"],
                page_size=1,
            )
            candidates = results.get("concepts", []) if results else []
            if candidates:
                best = candidates[0]
                print(f"    -> NEW: {best.get('concept_name')} (OMOP: {best['concept_id']})")
                print(f"    -> Add to shared mapping file for other hospitals")
            else:
                print(f"    -> No match - needs manual review")
        except omophub.APIError as e:
            print(f"    -> API error: {e.message}")

The Key Insight: Hospital B’s mapping took minutes instead of days because Hospital A already did the vocabulary search work. The shared mapping file isn’t an API feature - it’s a CSV. But OMOPHub made it fast to build, and the source_to_concept_map format makes it portable. Hospital B’s new mappings (like IL-6) get added to the shared file for Hospital C. The practical workflow for a multi-site research network: Step 1: Coordinating center builds initial mapping file (Use Case A)

Extract all unique local codes from the first participating site
Look up each via OMOPHub
Human reviews and approves each mapping
Save as source_to_concept_map CSV

Step 2: Distribute to network participants

Share via GitHub repo, shared drive, or project data package
Include review metadata (_match_method, _reviewed, _reviewer)

Step 3: Each site adapts the shared file (Use Case B)

Match their local codes against the shared mappings by display name similarity
Use OMOPHub to fill gaps (codes not in the shared file)
Contribute new mappings back to the shared file

Step 4: Iterate

The mapping file grows with each new site
Later sites have fewer gaps to fill
The coordinating center merges contributions and resolves conflicts

This isn’t a fancy platform feature - it’s a disciplined workflow using standard formats and a fast vocabulary API. But it’s how the Silo Tax actually gets reduced in practice. What a future collaborative mapping feature could add:

Centralized, API-accessible mapping repository (search other institutions’ mappings)
Confidence scoring based on number of institutions that agree on a mapping
Version-controlled mapping provenance
Automated conflict detection when institutions map the same local code differently

These would be genuinely valuable - but they require infrastructure beyond what OMOPHub currently provides. The workflow above works today. The Silo Tax persists because sharing local-to-standard mappings is harder than it should be. Standard vocabularies are shared (via Athena). Phenotype definitions are shared (via OHDSI PhenotypeLibrary). But the translation layer - the source_to_concept_map that each site builds laboriously - stays locked in institutional silos. OMOPHub makes building that mapping file fast: search for a local code display name, get back the standard OMOP concept, write it to the mapping table. The source_to_concept_map format makes the file portable: CSV in, CSV out, same schema everywhere. The missing piece isn’t technology - it’s workflow discipline. If your research network adopts the pattern of building, sharing, and iterating on mapping files, the Silo Tax drops with every site that joins. The first hospital does 100% of the mapping work. The second does 30%. By the fifth, it’s mostly review and edge cases. Start with one mapping file. Build it with OMOPHub. Share it with your network. Let the next site extend it. That’s how collaborative mapping works in practice - not with a magic platform, but with standard formats and a fast vocabulary API.

​1. The “Silo Tax”

​2. The Core Concept: The source_to_concept_map as a Sharing Format

​3. Use Case A: Building a Shareable Mapping File with OMOPHub

​4. Use Case B: Importing and Adapting a Shared Mapping File

​5. The Sharing Workflow

​6. Conclusion: Share Mappings, Not Just Vocabularies

1. The “Silo Tax”

2. The Core Concept: The `source_to_concept_map` as a Sharing Format

3. Use Case A: Building a Shareable Mapping File with OMOPHub

4. Use Case B: Importing and Adapting a Shared Mapping File

5. The Sharing Workflow

6. Conclusion: Share Mappings, Not Just Vocabularies