Skip to main content

1. The “Stochastic Parrot” in the Exam Room

An LLM passes the USMLE. It drafts a patient summary that reads better than most residents’ notes. It explains a complex diagnosis in plain language that would take a doctor fifteen minutes to compose. Magic. Then you ask it for the ICD-10 code for “Type 2 diabetes mellitus with diabetic chronic kidney disease.” It returns E11.22. Confident. Authoritative. And wrong: the actual code is E11.22 for Type 2 DM with diabetic CKD, but the decimal placement, edition specificity, and mapping to the correct OMOP concept are all things the LLM is guessing at based on pattern matching, not looking up in a source of truth. In a clinical setting, “mostly right” is the same as “dangerously wrong.” This is the hallucination problem for clinical AI. LLMs are reasoning engines - they understand clinical language, extract entities, and follow logical chains. But they are not vocabulary databases. They predict the next likely token; they don’t look up codes. When they output a SNOMED concept ID, an ICD-10 code, or an RxNorm identifier, they’re generating something that looks right based on training data patterns, not verifying it against a live vocabulary source. OMOPHub solves the verification half of this problem. It’s a vocabulary API: the source of truth for OMOP concept IDs, SNOMED codes, RxNorm identifiers, LOINC codes, and 100+ other vocabularies. By pairing an LLM (for reasoning and extraction) with OMOPHub (for vocabulary lookup and verification), you get a system where the LLM does what it’s good at and OMOPHub does what it’s good at. Neither one alone is sufficient. Together, they’re a clinical AI pipeline that’s both intelligent and grounded.

2. The Core Concept: The “OMOP-Loop”

Every clinical AI tool that touches standardized codes should follow this three-step pattern: 1. Extract: The LLM processes unstructured text (physician notes, patient questions, trial protocols) and identifies clinical entities: conditions, drugs, procedures, measurements. 2. Ground: Each extracted entity is sent to OMOPHub. The API returns the actual, verified OMOP concept ID. No guessing, no hallucination - just a vocabulary lookup. If the entity doesn’t match anything in OMOPHub, that’s a signal to flag it for review rather than fabricate a code. 3. Reason: The LLM uses the grounded, verified concept IDs to perform its final task: generating a query, drafting a coded summary, checking trial eligibility, or building a concept set. The key insight: the LLM never generates codes directly. It extracts clinical meaning from text (which it’s excellent at), then hands off to OMOPHub for the standardization step (which a vocabulary API is built for). This separation of concerns is what makes the system safe.

3. Use Case A: Building a Grounded Clinical Research Chatbot

A researcher asks: “Find me all patients with heart failure who are on ACE inhibitors.” A generic chatbot understands the intent but doesn’t know which OMOP concept IDs to query. A grounded chatbot does. The Workflow: LLM extracts “heart failure” and “ACE inhibitor” then OMOPHub resolves each to a standard concept ID and expands descendants and only then the application generates OMOP CDM SQL with verified IDs.
pip install omophub
Python
import omophub
import json
from openai import OpenAI  # Or any LLM client

omop = omophub.OMOPHub()
llm = OpenAI()  # API key from environment

def ground_clinical_query(user_query):
    """Extract clinical entities via LLM, ground them via OMOPHub."""

    print(f"User query: {user_query}\n")

    # --- Step 1: EXTRACT (LLM) ---
    extraction_prompt = (
        f"Extract clinical conditions and drug classes from this query: "
        f"'{user_query}'. Return ONLY a JSON array of strings, no explanation."
    )
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": extraction_prompt}],
    )
    raw_output = response.choices[0].message.content.strip()
    # Strip markdown code fences if present
    if raw_output.startswith("```"):
        raw_output = raw_output.split("\n", 1)[1].rsplit("```", 1)[0]
    extracted_terms = json.loads(raw_output)
    print(f"  LLM extracted: {extracted_terms}\n")

    grounded = {}

    # --- Step 2: GROUND (OMOPHub) ---
    for term in extracted_terms:
        print(f"  Grounding: '{term}'")

        # Determine likely domain from context
        # (In production, have the LLM tag each entity with its domain)
        is_drug = any(kw in term.lower() for kw in ["inhibitor", "blocker", "agonist", "drug", "medication"])
        vocab_filter = ["RxNorm"] if is_drug else ["SNOMED"]
        domain_filter = ["Drug"] if is_drug else ["Condition"]

        try:
            # Search OMOPHub for the standard concept
            results = omop.search.basic(
                term,
                vocabulary_ids=vocab_filter,
                domain_ids=domain_filter,
                page_size=3,
            )
            candidates = results.get("concepts", []) if results else []

            # If basic search misses, try semantic
            if not candidates:
                semantic = omop.search.semantic(term, vocabulary_ids=vocab_filter, domain_ids=domain_filter, page_size=3)
                candidates = (semantic.get("results", semantic.get("concepts", [])) if semantic else [])

            if candidates:
                best = candidates[0]
                concept_id = best["concept_id"]
                print(f"    -> {best.get('concept_name')} (ID: {concept_id})")

                # Expand descendants for comprehensive concept set
                concept_set = {concept_id}
                try:
                    desc = omop.hierarchy.descendants(concept_id, max_levels=3, relationship_types=["Is a"])
                    desc_list = (
                        desc if isinstance(desc, list)
                        else desc.get("concepts", [])
                    ) if desc else []
                    for d in desc_list:
                        concept_set.add(d["concept_id"])
                except omophub.APIError:
                    pass  # Use just the parent if hierarchy fails

                grounded[term] = {
                    "concept_name": best.get("concept_name"),
                    "concept_id": concept_id,
                    "domain": best.get("domain_id"),
                    "expanded_count": len(concept_set),
                    "concept_ids": list(concept_set),
                }
                print(f"    -> Expanded to {len(concept_set)} concepts (including descendants)")
            else:
                print(f"    -> NOT FOUND - flagging for manual review")
                grounded[term] = {"concept_name": None, "concept_id": None, "status": "ungrounded"}

        except omophub.APIError as e:
            print(f"    -> API error: {e.message}")

    return grounded


# --- Example ---
query = "Patients with heart failure on ACE inhibitors"
result = ground_clinical_query(query)

# Step 3: REASON - use grounded IDs to build a query
print("\n--- Grounded Concept Sets ---")
for term, data in result.items():
    if data.get("concept_id"):
        print(f"  '{term}': {data['concept_name']} -> {data['expanded_count']} concept IDs")
    else:
        print(f"  '{term}': UNGROUNDED - needs manual resolution")
The Key Insight: The LLM never touches a concept ID. It extracts “heart failure” and “ACE inhibitor” as text strings. OMOPHub resolves those strings to verified OMOP concept IDs. The hierarchy expansion ensures you catch patients coded with specific subtypes (e.g., “Congestive heart failure” or “Acute on chronic systolic heart failure”). This is the OMOP-Loop in action: Extract → Ground → Reason.

4. Use Case B: Grounded Clinical Note Coding

An AI scribe generates a visit summary. For it to be useful for billing and research, the clinical entities need to be linked to verified codes - not LLM-hallucinated ones. The Workflow: LLM generates the summary → LLM extracts key clinical entities → OMOPHub grounds each entity to the correct vocabulary (ICD-10-CM for conditions, CPT-4 for procedures, RxNorm for drugs).
Python
import omophub
import json
from openai import OpenAI

omop = omophub.OMOPHub()
llm = OpenAI()

# AI-generated visit summary
visit_summary = """
Patient presents with persistent cough and shortness of breath for 2 weeks.
Physical exam reveals wheezing. Suspected acute bronchitis.
Performed a chest X-ray to rule out pneumonia.
Prescribed Albuterol inhaler.
"""

print("Grounding clinical note for coding...\n")

# Step 1: LLM extracts entities with domain tags
extraction_prompt = f"""Extract clinical entities from this visit note.
Return JSON array of objects with "term" and "type" (condition/procedure/drug):

{visit_summary}

Return ONLY the JSON array."""

response = llm.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": extraction_prompt}],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("```"):
    raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
entities = json.loads(raw)
print(f"  LLM extracted: {json.dumps(entities, indent=2)}\n")

# Step 2: Ground each entity via OMOPHub in the appropriate vocabulary
DOMAIN_CONFIG = {
    "condition": {"vocabs": ["SNOMED"], "domains": ["Condition"]},
    "procedure": {"vocabs": ["CPT4", "SNOMED"], "domains": ["Procedure"]},
    "drug": {"vocabs": ["RxNorm"], "domains": ["Drug"]},
}

suggested_codes = []

for entity in entities:
    term = entity.get("term", "")
    entity_type = entity.get("type", "condition").lower()
    config = DOMAIN_CONFIG.get(entity_type, DOMAIN_CONFIG["condition"])

    print(f"  Grounding [{entity_type}]: '{term}'")

    try:
        results = omop.search.basic(
            term,
            vocabulary_ids=config["vocabs"],
            domain_ids=config["domains"],
            page_size=3,
        )
        candidates = results.get("concepts", []) if results else []

        if candidates:
            best = candidates[0]
            print(f"    -> {best.get('concept_name')} ({best.get('vocabulary_id')}: {best.get('concept_code', 'N/A')}, OMOP: {best['concept_id']})")

            # For billing codes, also get the ICD-10-CM or CPT4 mapping
            if entity_type == "condition":
                mappings = omop.mappings.get(best["concept_id"], target_vocabulary="ICD10CM")
                map_list = (
                    mappings if isinstance(mappings, list)
                    else mappings.get("concepts", mappings.get("mappings", []))
                ) if mappings else []
                if map_list:
                    icd = map_list[0]
                    print(f"    -> ICD-10-CM: {icd.get('concept_name')} ({icd.get('concept_code')})")

            suggested_codes.append({
                "term": term,
                "type": entity_type,
                "concept_name": best.get("concept_name"),
                "concept_id": best["concept_id"],
                "vocabulary": best.get("vocabulary_id"),
                "code": best.get("concept_code"),
            })
        else:
            print(f"    -> No match found")

    except omophub.APIError as e:
        print(f"    -> API error: {e.message}")

print(f"\n--- Suggested Codes (All Verified via OMOPHub) ---")
for code in suggested_codes:
    print(f"  [{code['vocabulary']}] {code['concept_name']} (Code: {code['code']}, OMOP: {code['concept_id']})")
The Key Insight: The LLM extracts entities and tags them by type. OMOPHub resolves each one in the appropriate vocabulary. No code is ever generated by the LLM - every code comes from a verified vocabulary lookup. This is the difference between “AI-suggested coding” (risky) and “AI-extracted, vocabulary-verified coding” (safe).

5. The Hallucination Safety Net

This is the most important pattern in the entire article. Every time an LLM outputs a clinical code or concept ID, verify it against OMOPHub before using it.
Python
import omophub

omop = omophub.OMOPHub()

def verify_concept(concept_id, expected_domain=None):
    """
    Verify that an LLM-generated concept ID actually exists in OMOP.
    Returns the concept if valid, None if hallucinated.
    """
    try:
        concept = omop.concepts.get(concept_id)

        if not concept:
            return {"valid": False, "reason": "Concept ID does not exist"}

        # Check domain if expected
        if expected_domain and concept.get("domain_id") != expected_domain:
            return {
                "valid": False,
                "reason": f"Domain mismatch: expected {expected_domain}, got {concept.get('domain_id')}",
                "concept": concept,
            }

        # Check if it's a standard concept
        if concept.get("standard_concept") != "S":
            return {
                "valid": False,
                "reason": f"Non-standard concept (standard_concept='{concept.get('standard_concept')}'). Use the 'Maps to' standard equivalent.",
                "concept": concept,
            }

        return {
            "valid": True,
            "concept_name": concept.get("concept_name"),
            "concept_id": concept["concept_id"],
            "domain": concept.get("domain_id"),
            "vocabulary": concept.get("vocabulary_id"),
        }

    except omophub.APIError:
        return {"valid": False, "reason": "Concept ID not found in OMOP vocabulary"}


# --- Example: LLM claims these are valid concept IDs ---
llm_outputs = [
    {"claimed_id": 201826, "claimed_name": "Type 2 diabetes mellitus", "domain": "Condition"},
    {"claimed_id": 9999999, "claimed_name": "Fake condition", "domain": "Condition"},
    {"claimed_id": 4329847, "claimed_name": "Myocardial infarction", "domain": "Condition"},
]

print("Hallucination Safety Check:\n")
for output in llm_outputs:
    result = verify_concept(output["claimed_id"], expected_domain=output["domain"])
    status = "VERIFIED" if result["valid"] else "REJECTED"
    reason = f" - {result['reason']}" if not result["valid"] else ""
    print(f"  {status}: ID {output['claimed_id']} ({output['claimed_name']}){reason}")
The Key Insight: This is a 10-line function that catches hallucinated codes before they enter your system. concepts.get() is a real OMOPHub SDK method - it takes a concept ID and returns the concept if it exists. If the LLM hallucinated the ID, the call returns nothing. If the ID exists but is in the wrong domain (the LLM said it was a Condition but it’s actually a Drug), the domain check catches it. If it’s a non-standard concept, you get a warning to find the standard equivalent. This single validation step is worth more than any amount of prompt engineering for clinical safety.

6. Conclusion: Don’t Just Prompt - Ground

LLMs are extraordinary reasoning engines. They understand clinical language better than any rule-based system ever could. But they are not vocabulary databases, and they should never be trusted to generate clinical codes from memory. The OMOP-Loop - Extract, Ground, Reason - is the pattern that makes clinical AI safe:
  • The LLM extracts clinical meaning from text (what it’s built for)
  • OMOPHub verifies and standardizes every entity against the OMOP vocabularies (what it’s built for)
  • The LLM reasons over verified, grounded data to produce its final output
No hallucinated codes. No fabricated concept IDs. No “mostly right” in a setting where mostly right is dangerously wrong. Build the OMOP-Loop into your next clinical AI tool. Start with the hallucination safety net - concepts.get() on every LLM-generated concept ID. That single check is the difference between a demo and a deployable system.