Skip to main content

1. The “Silo Tax”

Every hospital that joins an OMOP network faces the same mapping wall. Their lab system uses “Cr_Serum” for creatinine. The one across town uses “CREAT_BLD.” The academic medical center uses “Creatinine_S_mg.” All three mean the same thing: LOINC 2160-0, Creatinine [Mass/volume] in Serum or Plasma. But each hospital maps it independently. Three data engineers, three weeks of work, three times the cost - for the same result. Scale this across 500 hospitals and 800 unique lab codes each, and you’re looking at hundreds of thousands of hours of duplicated mapping labor per year. This is the Silo Tax: the cost of every institution solving the same vocabulary problem in isolation. The OHDSI community has partial solutions. USAGI is an open-source mapping tool that helps data engineers find standard concept matches for local codes. Athena distributes the vocabularies themselves. The OMOP CDM has a source_to_concept_map table designed specifically to store local-to-standard mappings. But none of these provide a way to share completed mappings between institutions. OMOPHub doesn’t solve the sharing problem either - it doesn’t have a collaborative mapping repository. But it accelerates the creation of mapping files by providing fast vocabulary search without requiring a local Athena installation. And because mapping files follow a standard format (source_to_concept_map), they can be shared between institutions manually - via GitHub, shared drives, or project-specific data exchanges. This article shows the practical workflow: use OMOPHub to build mapping files fast, format them as source_to_concept_map records, and share them between sites participating in the same research network.
Honest scoping: OMOPHub does not currently have a shared mapping repository, user-contributed mappings, or collaborative mapping API. The workflow described here uses OMOPHub for vocabulary search + standard file formats for sharing. A future collaborative mapping feature would be valuable - but this article works with what exists today.

2. The Core Concept: The source_to_concept_map as a Sharing Format

The OMOP CDM includes a table purpose-built for local-to-standard mappings:
source_to_concept_map:
  - source_code              (VARCHAR) - the local code, e.g., "Cr_Serum"
  - source_concept_id        (INTEGER) - 0 if no OMOP concept for the local code
  - source_vocabulary_id     (VARCHAR) - identifier for the local vocabulary, e.g., "HospitalA_Labs"
  - target_concept_id        (INTEGER) - the standard OMOP concept ID, e.g., 3016723
  - target_vocabulary_id     (VARCHAR) - e.g., "LOINC"
  - valid_start_date         (DATE)
  - valid_end_date           (DATE)
  - invalid_reason           (VARCHAR)
This table is the natural format for shareable mapping files. If Hospital A builds a source_to_concept_map for their lab codes and Hospital B has similar local codes, Hospital B can import Hospital A’s mappings as a starting point - reviewing and adjusting as needed. OMOPHub’s role: Populating the target_concept_id and target_vocabulary_id columns. For each local code, search OMOPHub to find the standard concept match, then write the result into the mapping table format.

3. Use Case A: Building a Shareable Mapping File with OMOPHub

A multi-site sepsis research project needs all participating hospitals to map their local lab codes for inflammatory markers to standard LOINC concepts. Instead of each hospital starting from scratch, the coordinating center builds an initial mapping file using OMOPHub, then distributes it.
pip install omophub
Python
import omophub
import json
from datetime import date

client = omophub.OMOPHub()

# Local codes from Hospital A's lab system (representative set for sepsis)
local_codes = [
    {"source_code": "CRP_HS", "display": "C-Reactive Protein, High Sensitivity", "vocab": "HospitalA_Labs"},
    {"source_code": "PCT_Level", "display": "Procalcitonin Level", "vocab": "HospitalA_Labs"},
    {"source_code": "Lactate_Art", "display": "Arterial Lactate", "vocab": "HospitalA_Labs"},
    {"source_code": "WBC_Auto", "display": "White Blood Cell Count, Automated", "vocab": "HospitalA_Labs"},
    {"source_code": "BldCx_Ana", "display": "Blood Culture Anaerobic", "vocab": "HospitalA_Labs"},
    {"source_code": "SepsisAlert", "display": "Sepsis Alert Triggered", "vocab": "HospitalA_Events"},
]

print("Building source_to_concept_map for sepsis project...\n")

mapping_records = []

for entry in local_codes:
    code = entry["source_code"]
    display = entry["display"]
    source_vocab = entry["vocab"]

    print(f"  {code}: '{display}'")

    target_id = 0  # Default: unmapped
    target_vocab = ""
    target_name = ""
    match_method = "unmapped"

    try:
        # Step 1: Search OMOPHub for the best LOINC/SNOMED match
        results = client.search.basic(
            display,
            vocabulary_ids=["LOINC", "SNOMED"],
            domain_ids=["Measurement", "Condition", "Observation"],
            page_size=3,
        )
        candidates = results.get("concepts", []) if results else []

        # Step 2: Semantic fallback if basic search misses
        if not candidates:
            semantic = client.search.semantic(display, vocabulary_ids=["LOINC", "SNOMED"], domain_ids=["Measurement", "Condition", "Observation"], page_size=3)
            candidates = (semantic.get("results", semantic.get("concepts", [])) if semantic else [])
            if candidates:
                match_method = "semantic"

        if candidates:
            best = candidates[0]
            target_id = best["concept_id"]
            target_vocab = best.get("vocabulary_id", "")
            target_name = best.get("concept_name", "")
            match_method = match_method if match_method == "fuzzy" else "basic"
            print(f"    -> {target_name} ({target_vocab}, OMOP: {target_id}) [{match_method}]")
        else:
            print(f"    -> NO MATCH - needs manual review")

    except omophub.APIError as e:
        print(f"    -> API error: {e.message}")
        match_method = "error"

    # Step 3: Build source_to_concept_map record
    mapping_records.append({
        "source_code": code,
        "source_concept_id": 0,
        "source_vocabulary_id": source_vocab,
        "source_code_description": display,
        "target_concept_id": target_id,
        "target_vocabulary_id": target_vocab,
        "valid_start_date": str(date.today()),
        "valid_end_date": "2099-12-31",
        "invalid_reason": None,
        # Extra metadata for review (not standard OMOP columns, but useful for sharing)
        "_match_method": match_method,
        "_target_concept_name": target_name,
        "_reviewed": False,
        "_reviewer": None,
    })

# Summary
mapped = sum(1 for r in mapping_records if r["target_concept_id"] != 0)
unmapped = len(mapping_records) - mapped
print(f"\n--- Mapping File Summary ---")
print(f"  Total: {len(mapping_records)}  |  Mapped: {mapped}  |  Needs review: {unmapped}")

# In production: save as CSV for sharing
# pd.DataFrame(mapping_records).to_csv("sepsis_source_to_concept_map_hospitalA.csv", index=False)

print(f"\n--- source_to_concept_map Records ---")
for r in mapping_records:
    status = f"-> {r['_target_concept_name']} ({r['target_vocabulary_id']}: {r['target_concept_id']})" if r["target_concept_id"] != 0 else "-> UNMAPPED"
    print(f"  {r['source_code']:15s} {status}  [{r['_match_method']}]")
The Key Insight: This script produces a source_to_concept_map-formatted file in minutes - work that would take days of manual USAGI review. The output is a CSV that can be shared with other hospitals in the network. Hospital B imports it, reviews the mappings (adjusting for their local code variants), and has a head start on their own mapping work.

4. Use Case B: Importing and Adapting a Shared Mapping File

Hospital B receives Hospital A’s mapping file. Their local codes are different (“CRP_Serum” instead of “CRP_HS”), but the target OMOP concepts are the same. They adapt the shared file and fill in the gaps.
Python
import omophub

client = omophub.OMOPHub()

# Hospital A's shared mapping file (loaded from CSV in production)
shared_mappings = {
    "C-Reactive Protein": {"target_concept_id": 3010156, "target_vocab": "LOINC"},
    "Procalcitonin": {"target_concept_id": 3046279, "target_vocab": "LOINC"},
    "Lactate": {"target_concept_id": 3047181, "target_vocab": "LOINC"},
    "White Blood Cell Count": {"target_concept_id": 3000905, "target_vocab": "LOINC"},
    "Blood Culture": {"target_concept_id": 3016407, "target_vocab": "LOINC"},
}

# Hospital B's local codes
hospital_b_codes = [
    {"source_code": "CRP_Serum", "display": "CRP Serum Level"},
    {"source_code": "PCT_Quant", "display": "Quantitative Procalcitonin"},
    {"source_code": "Lact_Venous", "display": "Venous Lactate"},
    {"source_code": "IL6_Level", "display": "Interleukin-6 Level"},  # Not in shared file
]

print("Adapting shared mapping file for Hospital B...\n")

for entry in hospital_b_codes:
    display = entry["display"]
    code = entry["source_code"]

    # Step 1: Check if any shared mapping matches by keyword overlap
    matched_shared = None
    for shared_key, shared_val in shared_mappings.items():
        if shared_key.lower() in display.lower() or any(
            word in display.lower() for word in shared_key.lower().split()
        ):
            matched_shared = (shared_key, shared_val)
            break

    if matched_shared:
        key, val = matched_shared
        print(f"  {code}: Reused shared mapping '{key}' -> OMOP {val['target_concept_id']} ({val['target_vocab']})")
    else:
        # Step 2: Not in shared file - look up via OMOPHub
        print(f"  {code}: Not in shared file. Searching OMOPHub...")
        try:
            results = client.search.basic(
                display,
                vocabulary_ids=["LOINC"],
                domain_ids=["Measurement"],
                page_size=1,
            )
            candidates = results.get("concepts", []) if results else []
            if candidates:
                best = candidates[0]
                print(f"    -> NEW: {best.get('concept_name')} (OMOP: {best['concept_id']})")
                print(f"    -> Add to shared mapping file for other hospitals")
            else:
                print(f"    -> No match - needs manual review")
        except omophub.APIError as e:
            print(f"    -> API error: {e.message}")
The Key Insight: Hospital B’s mapping took minutes instead of days because Hospital A already did the vocabulary search work. The shared mapping file isn’t an API feature - it’s a CSV. But OMOPHub made it fast to build, and the source_to_concept_map format makes it portable. Hospital B’s new mappings (like IL-6) get added to the shared file for Hospital C.

5. The Sharing Workflow

The practical workflow for a multi-site research network: Step 1: Coordinating center builds initial mapping file (Use Case A)
  • Extract all unique local codes from the first participating site
  • Look up each via OMOPHub
  • Human reviews and approves each mapping
  • Save as source_to_concept_map CSV
Step 2: Distribute to network participants
  • Share via GitHub repo, shared drive, or project data package
  • Include review metadata (_match_method, _reviewed, _reviewer)
Step 3: Each site adapts the shared file (Use Case B)
  • Match their local codes against the shared mappings by display name similarity
  • Use OMOPHub to fill gaps (codes not in the shared file)
  • Contribute new mappings back to the shared file
Step 4: Iterate
  • The mapping file grows with each new site
  • Later sites have fewer gaps to fill
  • The coordinating center merges contributions and resolves conflicts
This isn’t a fancy platform feature - it’s a disciplined workflow using standard formats and a fast vocabulary API. But it’s how the Silo Tax actually gets reduced in practice. What a future collaborative mapping feature could add:
  • Centralized, API-accessible mapping repository (search other institutions’ mappings)
  • Confidence scoring based on number of institutions that agree on a mapping
  • Version-controlled mapping provenance
  • Automated conflict detection when institutions map the same local code differently
These would be genuinely valuable - but they require infrastructure beyond what OMOPHub currently provides. The workflow above works today.

6. Conclusion: Share Mappings, Not Just Vocabularies

The Silo Tax persists because sharing local-to-standard mappings is harder than it should be. Standard vocabularies are shared (via Athena). Phenotype definitions are shared (via OHDSI PhenotypeLibrary). But the translation layer - the source_to_concept_map that each site builds laboriously - stays locked in institutional silos. OMOPHub makes building that mapping file fast: search for a local code display name, get back the standard OMOP concept, write it to the mapping table. The source_to_concept_map format makes the file portable: CSV in, CSV out, same schema everywhere. The missing piece isn’t technology - it’s workflow discipline. If your research network adopts the pattern of building, sharing, and iterating on mapping files, the Silo Tax drops with every site that joins. The first hospital does 100% of the mapping work. The second does 30%. By the fifth, it’s mostly review and edge cases. Start with one mapping file. Build it with OMOPHub. Share it with your network. Let the next site extend it. That’s how collaborative mapping works in practice - not with a magic platform, but with standard formats and a fast vocabulary API.