Skip to main content

1. From Data Points to Public Health

12,000 patients with a diabetes diagnosis in your health system. How many of them had an HbA1c test in the last 12 months? If you can’t answer that question in under a minute, you have a population health problem. The challenge isn’t data - it’s aggregation. A “Diabetes” diagnosis in one clinic is coded as a specific ICD-10 subtype in billing, a SNOMED concept in the EHR, and possibly a local code in a legacy system. Counting “all diabetics” means resolving hundreds of concept codes across vocabularies into a single, comprehensive set. Do the same for HbA1c measurements. Then query across millions of records. Manually, this takes weeks. By the time you have your answer, the care gap has widened. OMOPHub makes the vocabulary resolution fast. It’s a REST API for the OHDSI ATHENA vocabularies - SNOMED, ICD-10, LOINC, RxNorm, and 100+ others. Its hierarchy API lets you take a broad concept like “Type 2 Diabetes Mellitus” and retrieve every descendant code in a single call - all the specific subtypes, complications, and variants. Its search API handles fuzzy and semantic matching for disparate local names. Its mappings API resolves across vocabularies. The pattern is simple: define your clinical concept at a high level, expand it into a comprehensive concept set via OMOPHub, and plug that set into your OMOP CDM SQL queries. That turns population health questions into answerable queries - and care gaps into actionable lists.

2. The Core Concept: Rolling Up the Hierarchy

Population health analytics lives and dies on aggregation. You don’t want to count 500 individual respiratory infection codes separately - you want to count “Respiratory Infections” as a category, knowing it captures everything underneath. The OMOP vocabulary hierarchy is built for this. Every specific concept has parent concepts linked by “Is a” relationships, forming a tree. “Streptococcal pneumonia” Is a “Bacterial pneumonia” Is a “Pneumonia” Is a “Respiratory tract infection.” If you query for the parent, and include all its descendants, you catch everything. OMOPHub’s hierarchy.descendants() does this traversal via API. Give it a concept ID and a depth, and it returns every child, grandchild, and great-grandchild in the hierarchy. This is the “roll-up” - and it’s the foundation of every use case in this article. It works at any level of granularity:
  • Broad: “Cardiovascular disease” → hundreds of descendant conditions
  • Medium: “Type 2 Diabetes Mellitus” → dozens of specific subtypes
  • Narrow: “Atrial fibrillation” → a handful of specific variants
This multi-level capability means you can build dashboards that start at the disease class level, then drill down to specific conditions, then to individual patients - all driven by the same hierarchy API.

3. Use Case A: Identifying Gaps in Care (The “Diabetes Screening” Gap)

The most actionable application of population health: finding patients who should be getting care but aren’t. The Scenario: Your health system wants to identify all patients with a Diabetes diagnosis who haven’t had an HbA1c test in the last 12 months. These patients are your outreach targets. The Workflow: Use OMOPHub to build two concept sets (all Diabetes condition codes, all HbA1c measurement codes), then plug them into an OMOP CDM SQL query.
pip install omophub
Python
import omophub

client = omophub.OMOPHub()

# OMOP concept IDs for our parent concepts
# 201826 = Type 2 diabetes mellitus (SNOMED, standard concept)
# 3004410 = Hemoglobin A1c (LOINC measurement - verify via OMOPHub search)
dia_parent_id = 201826
hba1c_parent_id = 3004410

print("Building concept sets for care gap analysis...\n")

# --- Build Diabetes concept set ---
diabetes_ids = {dia_parent_id}
try:
    dia_desc = client.hierarchy.descendants(dia_parent_id, max_levels=5, relationship_types=["Is a"])
    desc_list = dia_desc if isinstance(dia_desc, list) else dia_desc.get("concepts", [])
    for d in desc_list:
        diabetes_ids.add(d["concept_id"])
    print(f"  Diabetes concept set: {len(diabetes_ids)} concepts (parent + descendants)")
except omophub.APIError as e:
    print(f"  Error expanding diabetes concepts: {e.message}")

# --- Build HbA1c concept set ---
hba1c_ids = {hba1c_parent_id}
try:
    hba1c_desc = client.hierarchy.descendants(hba1c_parent_id, max_levels=3, relationship_types=["Is a"])
    desc_list = hba1c_desc if isinstance(hba1c_desc, list) else hba1c_desc.get("concepts", [])
    for d in desc_list:
        hba1c_ids.add(d["concept_id"])
    print(f"  HbA1c concept set: {len(hba1c_ids)} concepts (parent + descendants)")
except omophub.APIError as e:
    print(f"  Error expanding HbA1c concepts: {e.message}")

# --- Generate the OMOP CDM SQL ---
dia_ids_str = ", ".join(str(i) for i in diabetes_ids)
hba1c_ids_str = ", ".join(str(i) for i in hba1c_ids)

print("\n--- OMOP CDM SQL: Patients with Diabetes but No Recent HbA1c ---\n")
print(f"""SELECT p.person_id, p.gender_concept_id, p.year_of_birth
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
WHERE co.condition_concept_id IN ({dia_ids_str})
  AND NOT EXISTS (
    SELECT 1
    FROM measurement m
    WHERE m.person_id = p.person_id
      AND m.measurement_concept_id IN ({hba1c_ids_str})
      AND m.measurement_date >= DATEADD(month, -12, GETDATE())
  );""")

print("\n-- Note: Date syntax varies by database engine.")
print("-- SQL Server: DATEADD(month, -12, GETDATE())")
print("-- PostgreSQL: CURRENT_DATE - INTERVAL '12 months'")
print("-- MySQL: DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH)")
The Key Insight: This is OMOPHub doing exactly what it’s built for. The hierarchy expansion ensures your Diabetes concept set catches patients coded as “Type 2 DM with renal complications,” “Type 2 DM with peripheral angiopathy,” or any other specific subtype - not just the top-level code. Without this expansion, you’d systematically undercount your care gap population. The concept set feeds directly into standard OMOP CDM SQL, making the entire workflow reproducible and auditable.

4. Use Case B: Disease Surveillance Across Heterogeneous Sources

Public health surveillance requires aggregating data from dozens of hospitals, each with different coding practices. The goal: detect trends in broad disease categories regardless of how the source data was coded. The Scenario: You’re monitoring “Flu-like Illness” across 50 hospitals. Some report in ICD-10-CM, some in SNOMED, and some use local codes. You need a single surveillance count. The Workflow:
  1. Build a surveillance concept set for “Respiratory tract infection” by expanding the hierarchy
  2. For incoming standard codes (ICD-10, SNOMED), look them up in OMOPHub and check if they’re in the set
  3. For local/proprietary codes, flag for manual mapping (OMOPHub can’t resolve these)
Python
import omophub

client = omophub.OMOPHub()

# Incoming diagnosis events from various hospitals
incoming_events = [
    {"code": "J11.1", "vocab": "ICD10CM", "hospital": "Hospital A"},
    {"code": "J06.9", "vocab": "ICD10CM", "hospital": "Hospital B"},
    {"code": "FLU_SYMPT", "vocab": "LOCAL", "hospital": "Hospital C"},
    {"code": "233604007", "vocab": "SNOMED", "hospital": "Hospital D"},
]

# Step 1: Build surveillance concept set via hierarchy
# First, find the OMOP concept ID for "Respiratory tract infection"
print("Building surveillance concept set...\n")

search_result = client.search.basic(
    "Infection of respiratory tract",
    vocabulary_ids=["SNOMED"],
    domain_ids=["Condition"],
    page_size=1,
)
candidates = search_result.get("concepts", []) if search_result else []

if not candidates:
    print("Could not find parent concept for respiratory tract infection.")
else:
    parent = candidates[0]
    parent_id = parent["concept_id"]
    print(f"  Parent: {parent.get('concept_name')} (OMOP ID: {parent_id})")

    # Expand to all descendants
    surveillance_ids = {parent_id}
    try:
        descendants = client.hierarchy.descendants(parent_id, max_levels=5, relationship_types=["Is a"])
        desc_list = descendants if isinstance(descendants, list) else descendants.get("concepts", [])
        for d in desc_list:
            surveillance_ids.add(d["concept_id"])
        print(f"  Surveillance set: {len(surveillance_ids)} concepts\n")
    except omophub.APIError as e:
        print(f"  Error expanding hierarchy: {e.message}\n")

    # Step 2: Classify incoming events
    print("Classifying incoming diagnosis events:\n")
    flu_count = 0
    unresolved = 0

    for event in incoming_events:
        code = event["code"]
        vocab = event["vocab"]
        hospital = event["hospital"]

        if vocab == "LOCAL":
            # OMOPHub cannot resolve proprietary local codes
            print(f"  {hospital}: '{code}' ({vocab}) -> UNRESOLVED (local code, needs manual mapping)")
            unresolved += 1
            continue

        # Look up the standard code in OMOPHub
        try:
            results = client.search.basic(
                code,
                vocabulary_ids=[vocab],
                domain_ids=["Condition"],
                page_size=1,
            )
            matches = results.get("concepts", []) if results else []

            if matches:
                matched = matches[0]
                omop_id = matched["concept_id"]
                c_name = matched.get("concept_name", "Unknown")

                if omop_id in surveillance_ids:
                    print(f"  {hospital}: '{code}' ({vocab}) -> {c_name} -> FLAGGED as flu-like illness")
                    flu_count += 1
                else:
                    print(f"  {hospital}: '{code}' ({vocab}) -> {c_name} -> Not in surveillance set")
            else:
                print(f"  {hospital}: '{code}' ({vocab}) -> No match found")
                unresolved += 1

        except omophub.APIError as e:
            print(f"  {hospital}: '{code}' ({vocab}) -> API error: {e.message}")
            unresolved += 1

    print(f"\nSurveillance Summary:")
    print(f"  Flu-like illness events: {flu_count}")
    print(f"  Unresolved events: {unresolved} (need manual review)")
The Key Insight: The hierarchy-based surveillance set is the foundation - it defines exactly which concepts count as “flu-like illness.” Incoming standard codes get resolved to OMOP concept IDs and checked against this set. Local codes can’t be resolved automatically - that’s an honest limitation. In production, you’d maintain a local-to-standard mapping table built over time with human review. The result: a single surveillance count across heterogeneous sources, updated as fast as data arrives.

5. Risk Stratification: Beyond Simple Counts

Two patients both have “Type 2 Diabetes.” One is well-controlled with no complications. The other has diabetic retinopathy, chronic kidney disease, and a prior MI. They need very different levels of care - but a simple count treats them identically. Risk stratification means identifying which patients carry comorbidities and complications that increase their risk. OMOPHub helps by building complication concept sets programmatically.
Python
import omophub

client = omophub.OMOPHub()

# A patient's condition concept IDs (from their OMOP record)
patient_conditions = {201826, 40484648, 4329847, 443727, 40480853}
# Note: verify all IDs via OMOPHub search before production use

# Define complication categories to check
# Instead of hardcoding IDs, search for the parent concept and expand
complication_categories = [
    "Diabetic retinopathy",
    "Diabetic nephropathy",
    "Chronic kidney disease",
    "Myocardial infarction",
]

print("Risk Stratification: Checking for diabetes complications\n")

risk_score = 0
identified_complications = []

for comp_name in complication_categories:
    try:
        results = client.search.basic(
            comp_name,
            vocabulary_ids=["SNOMED"],
            domain_ids=["Condition"],
            page_size=1,
        )
        candidates = results.get("concepts", []) if results else []

        if not candidates:
            print(f"  '{comp_name}': Not found in SNOMED")
            continue

        comp_concept = candidates[0]
        comp_id = comp_concept["concept_id"]

        # Expand to include subtypes
        comp_set = {comp_id}
        try:
            desc = client.hierarchy.descendants(comp_id, max_levels=3, relationship_types=["Is a"])
            desc_list = desc if isinstance(desc, list) else desc.get("concepts", [])
            for d in desc_list:
                comp_set.add(d["concept_id"])
        except omophub.APIError:
            pass  # Use just the parent if hierarchy fails

        # Check if patient has any concept in this complication set
        overlap = patient_conditions & comp_set
        if overlap:
            risk_score += 1
            identified_complications.append(comp_concept.get("concept_name", comp_name))
            print(f"  PRESENT: {comp_concept.get('concept_name', comp_name)} ({len(comp_set)} concepts in set, {len(overlap)} matched)")
        else:
            print(f"  Absent: {comp_concept.get('concept_name', comp_name)} ({len(comp_set)} concepts checked)")

    except omophub.APIError as e:
        print(f"  '{comp_name}': API error - {e.message}")

print(f"\nRisk Summary:")
print(f"  Complications found: {len(identified_complications)} of {len(complication_categories)}")
print(f"  Complications: {', '.join(identified_complications) if identified_complications else 'None'}")
print(f"  Risk tier: {'HIGH' if risk_score >= 2 else 'MODERATE' if risk_score == 1 else 'LOW'}")
The Key Insight: The hierarchy expansion makes risk stratification robust. Instead of checking for a single concept ID (which would miss patients coded with a specific subtype), you check against an expanded set. A patient coded as “Stage 3 chronic kidney disease” gets caught by the CKD complication check because it’s a descendant of the parent CKD concept. This is the same hierarchy roll-up from Use Case A, applied to a different question - and it’s the pattern that makes OMOPHub genuinely useful for population health.

6. Conclusion: From Reactive to Proactive

Population health analytics comes down to answering simple questions at scale: Who has diabetes but isn’t being monitored? Is flu season hitting harder this year than last? Which patients are at highest risk for complications? The vocabulary complexity is what makes these questions hard. OMOPHub handles the vocabulary layer - expanding concept hierarchies, resolving across vocabularies, looking up codes - so you can focus on the clinical logic and the actionable insights. The pattern across all three use cases is the same: define a clinical concept, expand it into a comprehensive concept set via OMOPHub’s hierarchy API, and use that set to query your OMOP CDM database. Whether you’re finding care gaps, running surveillance, or stratifying risk, the vocabulary machinery is identical. That’s the power of standardization. Start with Use Case A. Pick a quality measure that matters to your organization - HbA1c for diabetics, colonoscopy for age-eligible patients, statin therapy for cardiovascular risk. Build the concept sets, write the SQL, and see what falls out. The care gaps are there. OMOPHub helps you find them.