1. The “Needle in a Haystack” Problem
A Phase III diabetes trial with 200 empty slots and six months to fill them. The patients exist - their data is sitting right there in the OMOP CDM. The problem? The eligibility criteria say things like “Adults with HbA1c > 7.0% despite Metformin therapy” - and translating that into queries acrosscondition_occurrence, measurement, and drug_exposure tables, with the right concept IDs, the right value thresholds, and the right temporal logic, takes weeks of manual work per trial.
This is the bottleneck in clinical trial recruitment. Not a lack of patients - a lack of infrastructure to match patients to trials at scale.
Solving this requires two things working together: (1) an NLP or LLM system to parse the eligibility criteria text into structured components (conditions, drugs, measurements, thresholds, temporal constraints), and (2) a vocabulary API to resolve those components into standardized OMOP concept IDs that you can actually query against.
OMOPHub handles the second part. It gives you instant API access to the full OHDSI ATHENA vocabulary - SNOMED for conditions, RxNorm for drugs, LOINC for measurements - so you can resolve “Type 2 Diabetes Mellitus” to concept ID 201826, “Metformin” to its RxNorm ingredient, and “HbA1c” to its LOINC code, all without maintaining a local vocabulary database.
But OMOPHub’s real superpower for trial screening is concept set expansion. A trial criterion says “Type 2 Diabetes.” But your patients might be coded as “Type 2 DM with renal complications,” “Type 2 DM with peripheral angiopathy,” or a dozen other specific variants. Simple ID matching would miss them. OMOPHub’s hierarchy API lets you expand a single concept into all its descendants - catching every patient who should qualify, regardless of how specifically they were coded.
2. The Core Concept: From Criteria Text to OMOP Queries
Automating trial eligibility screening is a multi-step process. Here’s how the pieces fit together: Step 1 - Parse the criteria (NLP/LLM). Take the eligibility text from the protocol and extract structured components. A criterion like “patients with Type 2 Diabetes Mellitus and HbA1c > 7.0%, currently on Metformin” decomposes into:- Condition: “Type 2 Diabetes Mellitus” (inclusion)
- Measurement: “HbA1c” > 7.0% (inclusion)
- Drug: “Metformin” (inclusion, current exposure)
- “Type 2 Diabetes Mellitus” → SNOMED concept ID
201826 - “Metformin” → RxNorm ingredient concept ID
- “HbA1c” → LOINC concept ID
hierarchy.descendants() to build complete concept sets. “Type 2 Diabetes Mellitus” has dozens of child concepts in SNOMED. You want to match patients coded with any of them.
Step 4 - Query the OMOP database. Apply the resolved, expanded concept IDs against the patient database (condition_occurrence, drug_exposure, measurement tables) with the appropriate value/temporal filters.
OMOPHub owns Steps 2 and 3. That’s where it adds the most value - turning clinical terms into queryable concept sets without the overhead of a local vocabulary database.
3. Use Case A: Resolving Parsed Criteria to OMOP Concept Sets
Suppose your NLP step has already extracted the key entities from the trial protocol. Now you need OMOP concept IDs - and not just the top-level concept, but the full expanded set of descendants. The Scenario: A trial requires patients with “Type 2 Diabetes Mellitus” and current “Metformin” use. You need concept sets for both. Code Snippet: Resolving Criteria Entities and Expanding Concept SetsPython
4. Use Case B: Patient Pre-screening Against Trial Criteria
Once you have expanded concept sets from Use Case A, screening a patient becomes straightforward set logic: does the patient’s OMOP profile overlap with the trial’s required concepts? The Scenario: A patient is in the clinic. Their OMOP record has condition, drug, and measurement concept IDs. You need to check if they match the trial criteria. Code Snippet: Pre-screening a PatientPython
5. The “Explainable Matching” Layer
A simple “eligible / not eligible” isn’t enough for clinicians. They need to understand why - especially for exclusions. Was the patient excluded because of a permanent contraindication, or something that could change (like a medication that could be washed out)? This is where pairing OMOPHub’s structured vocabulary data with an external LLM creates real clinical value. Example Workflow:- OMOPHub identifies that the patient has concept ID
4329847(Myocardial Infarction), which is an exclusion criterion - OMOPHub provides the structured metadata: concept name, domain, vocabulary
- You feed this to an LLM with the trial context to generate a clinician-facing explanation
Python