Skip to main content

1. The “Maintenance Tax”

Every OMOP implementation has a dirty secret: the vocabulary tables. They’re the backbone of everything - every concept ID, every mapping, every hierarchy traversal depends on them. And they go stale. SNOMED releases twice a year. RxNorm updates monthly. ICD-10-CM gets annual revisions. LOINC adds new lab codes quarterly. Each release can add concepts, deprecate others, change mappings, or restructure hierarchies. If your local vocabulary tables are from six months ago, your ETL is mapping against an outdated reality. A new drug approved in March doesn’t exist in your January vocabulary. A SNOMED concept that was merged in the April release is still two separate concepts in your system. This is Version Drift - and the manual process of fixing it is the Maintenance Tax: download multi-gigabyte Athena files, load them into your database (hours to days), re-run ETL validation, hope nothing broke. It’s a lost weekend every quarter, and most teams put it off because the operational cost is so high. OMOPHub doesn’t eliminate the need for local vocabulary tables - your OMOP CDM requires them for performant SQL joins. But it provides a fast, always-current API for vocabulary lookups that complements your local installation in specific, high-value ways:
  • Version checking: Is your local SNOMED current, or has a new release dropped?
  • Concept validation: Is this concept ID still active, or was it deprecated?
  • Ad-hoc resolution: During development, look up a concept without querying your local database
  • Gap detection: Find concepts in the latest vocabulary that don’t exist in your local installation
Think of OMOPHub as the “weather check” for your vocabulary - you still need a roof (local tables), but you want to know when a storm is coming (vocabulary updates that could affect your data).

2. The Core Concept: When to Use OMOPHub vs. Local Vocabularies

This distinction matters. Getting it wrong leads to either operational fragility (over-reliance on API) or stale data (ignoring updates). Use local vocabulary tables when:
  • Running production ETL on millions of records (local SQL joins are orders of magnitude faster than API calls)
  • Executing OMOP CDM queries that join against concept, concept_relationship, or concept_ancestor tables
  • Reproducing research results (you need a fixed vocabulary version, not a live API that might change)
  • Any workflow where performance and reliability are critical
Use OMOPHub API when:
  • Checking if your local vocabularies are current
  • Validating individual concepts during development or debugging
  • Resolving codes in low-volume, real-time applications (CDS alerts, FHIR integration)
  • Detecting deprecated or changed concepts in your existing OMOP data
  • Prototyping before you have a full local Athena installation
The combination: OMOPHub tells you when to update. Athena provides the files to update with. Your local database is where you update to. This three-part workflow is vocabulary lifecycle management done right.

3. Use Case A: Detecting When Your Local Vocabularies Are Stale

Before updating anything, you need to know if an update is needed. OMOPHub can help by letting you check the current state of vocabulary concepts against what you have locally. The Scenario: Your ETL runs nightly. Once a week, before the ETL starts, a pre-check script queries OMOPHub to determine if any key vocabularies have changed since your last Athena download. The Approach: Since the OMOPHub SDK doesn’t expose a dedicated vocabulary version endpoint, we use a practical proxy: check a set of well-known “sentinel” concepts from each vocabulary and compare their metadata (validity dates, standard status) against your local records. If a sentinel concept has changed, a vocabulary update is likely.
pip install omophub
Python
import omophub

client = omophub.OMOPHub()

# Sentinel concepts - well-known, stable concepts from each vocabulary
# If these change, the vocabulary has been updated
SENTINELS = {
    "SNOMED": {
        "concept_name": "Type 2 diabetes mellitus",
        "search_term": "Type 2 diabetes mellitus",
        "expected_domain": "Condition",
    },
    "RxNorm": {
        "concept_name": "Metformin",
        "search_term": "Metformin",
        "expected_domain": "Drug",
    },
    "LOINC": {
        "concept_name": "Glucose [Mass/volume] in Blood",
        "search_term": "2339-0",
        "expected_domain": "Measurement",
    },
    "ICD10CM": {
        "concept_name": "Type 2 diabetes mellitus without complications",
        "search_term": "E11.9",
        "expected_domain": "Condition",
    },
}

# Your local vocabulary state (in production, read from your local OMOP database)
local_state = {
    "SNOMED": {"last_concept_id": 201826, "last_checked": "2025-01-15"},
    "RxNorm": {"last_concept_id": 1503297, "last_checked": "2025-01-15"},
    "LOINC": {"last_concept_id": 3004501, "last_checked": "2025-01-15"},
    "ICD10CM": {"last_concept_id": 45576876, "last_checked": "2025-01-15"},
}


def check_vocabulary_freshness():
    """Check if local vocabularies might be stale by querying OMOPHub sentinels."""

    print("Vocabulary Freshness Check\n")
    alerts = []

    for vocab_id, sentinel in SENTINELS.items():
        print(f"  Checking {vocab_id}...")

        try:
            # Step 1: Search for the sentinel concept
            results = client.search.basic(
                sentinel["search_term"],
                vocabulary_ids=[vocab_id],
                page_size=1,
            )
            candidates = results.get("concepts", []) if results else []

            if not candidates:
                print(f"    -> Sentinel not found - possible vocabulary restructuring")
                alerts.append(f"{vocab_id}: sentinel concept not found")
                continue

            remote = candidates[0]
            remote_id = remote["concept_id"]
            remote_name = remote.get("concept_name", "Unknown")
            remote_valid_end = remote.get("valid_end_date", "N/A")
            remote_standard = remote.get("standard_concept", "N/A")

            local = local_state.get(vocab_id, {})
            local_id = local.get("last_concept_id")

            print(f"    Remote: {remote_name} (ID: {remote_id}, valid_end: {remote_valid_end}, standard: {remote_standard})")

            # Step 2: Check if the concept ID changed (rare but significant)
            if local_id and remote_id != local_id:
                print(f"    ALERT: Concept ID changed from {local_id} to {remote_id}")
                alerts.append(f"{vocab_id}: sentinel concept ID changed")

            # Step 3: Check if the concept has been deprecated
            if remote_standard != "S":
                print(f"    ALERT: Sentinel is no longer standard (standard_concept='{remote_standard}')")
                alerts.append(f"{vocab_id}: sentinel concept deprecated")

            # Step 4: Verify concept is still valid
            if remote_valid_end and remote_valid_end != "2099-12-31":
                print(f"    WARNING: Concept has a non-default valid_end_date: {remote_valid_end}")

            if not any(vocab_id in a for a in alerts):
                print(f"    OK: Sentinel matches expectations")

        except omophub.APIError as e:
            print(f"    API error: {e.message}")
            alerts.append(f"{vocab_id}: API error during check")

    print(f"\n--- Summary ---")
    if alerts:
        print(f"  {len(alerts)} alert(s) detected - consider updating from Athena:")
        for alert in alerts:
            print(f"    - {alert}")
        print(f"\n  To update: https://athena.ohdsi.org/vocabulary/list")
    else:
        print(f"  All vocabularies appear current. Next check recommended in 2 weeks.")

    return alerts


# Run the check
alerts = check_vocabulary_freshness()
The Key Insight: This doesn’t replace Athena downloads. It tells you when to do them. Instead of updating on a fixed schedule (quarterly, whether needed or not) or never (because it’s too painful), you check programmatically and update only when something has changed. The sentinel approach is a lightweight proxy: for full version detection, you’d check multiple concepts per vocabulary or compare concept counts.

4. Use Case B: Detecting Deprecated Concepts in Your Existing OMOP Data

This is where OMOPHub provides unique value that local vocabulary tables don’t: checking your existing data against the latest vocabulary to find concepts that have been deprecated, merged, or replaced since you last updated. The Scenario: Your OMOP CDM has 50,000 unique condition concept IDs in the condition_occurrence table. Some of these may reference concepts that have been deprecated in the latest SNOMED release. You need to find them and map them to their current replacements.
Python
import omophub

client = omophub.OMOPHub()


def check_concepts_for_deprecation(concept_ids, batch_label=""):
    """
    Check a list of OMOP concept IDs against OMOPHub to find deprecated ones.
    Returns list of deprecated concepts with replacement suggestions.
    """
    print(f"\nChecking {len(concept_ids)} concepts for deprecation{f' ({batch_label})' if batch_label else ''}...\n")

    deprecated = []
    valid = 0
    errors = 0

    for cid in concept_ids:
        try:
            # Step 1: Fetch the concept from OMOPHub
            concept = client.concepts.get(cid)

            if not concept:
                print(f"  ID {cid}: NOT FOUND - may have been removed entirely")
                deprecated.append({"concept_id": cid, "status": "not_found", "replacement": None})
                continue

            standard = concept.get("standard_concept")
            valid_end = concept.get("valid_end_date", "2099-12-31")
            name = concept.get("concept_name", "Unknown")

            # Step 2: Check if deprecated (non-standard or expired)
            if standard != "S":
                print(f"  ID {cid}: DEPRECATED - '{name}' (standard_concept='{standard}')")

                # Step 3: Try to find the replacement via "Maps to" relationship
                replacement = None
                try:
                    rels = client.concepts.relationships(cid)
                    rel_list = (
                        rels if isinstance(rels, list)
                        else rels.get("relationships", [])
                    ) if rels else []

                    maps_to = [
                        r for r in rel_list
                        if r.get("relationship_id") == "Maps to"
                        and r.get("concept_id") != cid  # Exclude self-maps
                    ]
                    if maps_to:
                        replacement = maps_to[0]
                        rep_name = replacement.get("concept_name", "Unknown")
                        rep_id = replacement.get("concept_id")
                        print(f"           -> Replacement: '{rep_name}' (ID: {rep_id})")
                except omophub.APIError:
                    pass

                deprecated.append({
                    "concept_id": cid,
                    "concept_name": name,
                    "status": "deprecated",
                    "replacement_id": replacement.get("concept_id") if replacement else None,
                    "replacement_name": replacement.get("concept_name") if replacement else None,
                })
            else:
                valid += 1

        except omophub.APIError:
            errors += 1

    print(f"\n--- Results ---")
    print(f"  Valid (standard): {valid}")
    print(f"  Deprecated/removed: {len(deprecated)}")
    print(f"  Errors: {errors}")

    if deprecated:
        print(f"\n  Deprecated concepts requiring remapping:")
        for d in deprecated:
            rep = f" -> {d['replacement_name']} (ID: {d['replacement_id']})" if d.get("replacement_id") else " -> no replacement found"
            print(f"    ID {d['concept_id']}: {d.get('concept_name', 'N/A')}{rep}")

    return deprecated


# Example: check a sample of concept IDs from your condition_occurrence table
# In production, query: SELECT DISTINCT condition_concept_id FROM condition_occurrence
sample_condition_ids = [201826, 4329847, 316139, 999999999]  # Last one is fake

deprecated_concepts = check_concepts_for_deprecation(sample_condition_ids, batch_label="condition_occurrence")
The Key Insight: client.concepts.get() is a real SDK method. This pattern - iterate through your existing concept IDs, check each against OMOPHub, find the deprecated ones, look up replacements via relationships - is a genuine, high-value use of the API. You can’t do this easily with local vocabulary tables if those tables are themselves stale. OMOPHub, with its always-current data, becomes a validation layer on top of your local installation. For production use: Batch this across your OMOP tables: condition_occurrence.condition_concept_id, drug_exposure.drug_concept_id, measurement.measurement_concept_id, etc. Run it monthly. Generate a deprecation report. Use the replacement concept IDs to plan your next vocabulary update and ETL re-mapping.

5. The Vocabulary Lifecycle Workflow

Putting it all together: Weekly: Freshness check (Use Case A)
  • Run sentinel checks against OMOPHub
  • If alerts fire, schedule an Athena download
Monthly: Deprecation scan (Use Case B)
  • Extract distinct concept IDs from your OMOP CDM tables
  • Check each against OMOPHub for deprecation
  • Generate a report of deprecated concepts and their replacements
Quarterly (or when alerts fire): Full vocabulary update
  • Download latest vocabularies from Athena (athena.ohdsi.org)
  • Load into your local OMOP vocabulary tables
  • Re-run ETL validation against the updated vocabularies
  • Update your local state records for the next freshness check
OMOPHub’s role: steps 1 and 2 (detection). Athena’s role: step 3 (the actual vocabulary files). Your database: step 3 (where the files get loaded). Each tool does what it’s built for.

6. Conclusion: From Reactive to Proactive

The Maintenance Tax isn’t the vocabulary update itself - it’s not knowing when you need one. Teams either update too often (wasting operational time on unchanged vocabularies) or too rarely (accumulating version drift that compounds with every ETL run). OMOPHub makes vocabulary lifecycle management proactive. Check freshness programmatically. Detect deprecated concepts before they corrupt your analyses. Know exactly when to pull the trigger on an Athena download, and know exactly which concepts need remapping when you do. Your local vocabulary tables remain the source of truth for your OMOP CDM. OMOPHub is the early warning system that keeps them honest. Start with the deprecation scan. Pull the distinct concept IDs from your biggest OMOP table, run them through client.concepts.get(), and see what comes back deprecated. That report alone is worth the integration.