> ## Documentation Index
> Fetch the complete documentation index at: https://docs.omophub.com/llms.txt
> Use this file to discover all available pages before exploring further.

# FHIR → OMOP Standardization

> The complete workflow from FHIR-coded clinical data to standardized OMOP CDM tables - resolve CodeableConcepts, assign domains, and load CDM tables.

This guide walks through the complete path from FHIR-coded clinical data to populated OMOP CDM tables using OMOPHub. It covers vocabulary resolution, domain assignment, standard concept mapping, and CDM table placement - the four steps that every FHIR-to-OMOP transformation pipeline has to solve.

If you're looking for a resource-by-resource cookbook (Condition → `condition_occurrence`, Observation → `measurement`, MedicationStatement → `drug_exposure`, and so on), see the [FHIR Integration](/guides/integration/fhir-integration) guide. This page focuses on the vocabulary standardization layer that sits at the heart of any FHIR-to-OMOP pipeline.

## 1. Why Vocabulary Resolution Is the Hard Part

Converting FHIR resources to OMOP CDM tables is not primarily a schema transformation problem. The structural mapping - which FHIR fields go into which OMOP columns - is well-documented in the [HL7 FHIR-to-OMOP IG](https://build.fhir.org/ig/HL7/fhir-omop-ig/). The hard part is vocabulary standardization: taking the coded clinical concepts in your FHIR data and resolving them to the correct OMOP standard concepts.

This is hard because:

* FHIR `CodeableConcept` fields can contain codes from any vocabulary system - SNOMED CT, ICD-10-CM, LOINC, RxNorm, local hospital codes, or several simultaneously
* OMOP requires a specific `*_concept_id` column in each clinical table to point at a **standard concept**, and the vocabulary domain determines which CDM table the record belongs to
* The same clinical idea can appear as different codes in different systems, and a single ICD-10 code can map to multiple SNOMED concepts
* Some FHIR codes are already standard OMOP concepts (most SNOMED codes), while others need mapping via `Maps to` relationships (ICD-10, NDC, local codes)

OMOPHub handles this resolution layer so your pipeline doesn't have to maintain a local vocabulary database. See [Why OMOPHub vs Self-Hosting](/guides/production/why-omophub) for the comparison against hosting ATHENA yourself.

## 2. The Four-Step Flow

Every FHIR-to-OMOP vocabulary resolution follows the same pattern.

<Steps>
  <Step title="Extract the coded concept from the FHIR resource">
    A FHIR resource contains one or more `CodeableConcept` or `Coding` elements. For example, a FHIR `Condition` resource might contain:

    ```json theme={null}
    {
      "resourceType": "Condition",
      "code": {
        "coding": [
          {
            "system": "http://snomed.info/sct",
            "code": "44054006",
            "display": "Type 2 diabetes mellitus"
          },
          {
            "system": "http://hl7.org/fhir/sid/icd-10-cm",
            "code": "E11.9",
            "display": "Type 2 diabetes mellitus without complications"
          }
        ]
      }
    }
    ```

    This Condition has two codings for the same clinical idea: one in SNOMED CT, one in ICD-10-CM.
  </Step>

  <Step title="Resolve to the OMOP standard concept">
    Send the codings to OMOPHub's Concept Resolver. It handles vocabulary identification, concept lookup, `Maps to` traversal, and OHDSI vocabulary preference ranking in a single call:

    ```bash theme={null}
    curl -X POST https://api.omophub.com/v1/fhir/resolve/codeable-concept \
      -H "Authorization: Bearer oh_your_api_key" \
      -H "Content-Type: application/json" \
      -d '{
        "coding": [
          { "system": "http://snomed.info/sct", "code": "44054006" },
          { "system": "http://hl7.org/fhir/sid/icd-10-cm", "code": "E11.9" }
        ],
        "resource_type": "Condition"
      }'
    ```

    The Resolver returns a `best_match` based on OHDSI vocabulary preference (SNOMED > RxNorm > LOINC > CVX > ICD-10 for conditions), plus `alternatives` and `unresolved` arrays. The important payload is nested under `best_match.resolution`:

    ```json theme={null}
    {
      "data": {
        "best_match": {
          "resolution": {
            "source_concept": {
              "concept_id": 45576876,
              "concept_code": "44054006",
              "concept_name": "Type 2 diabetes mellitus",
              "vocabulary_id": "SNOMED",
              "standard_concept": "S"
            },
            "standard_concept": {
              "concept_id": 201826,
              "concept_name": "Type 2 diabetes mellitus",
              "vocabulary_id": "SNOMED",
              "domain_id": "Condition",
              "concept_class_id": "Clinical Finding",
              "standard_concept": "S"
            },
            "mapping_type": "direct",
            "target_table": "condition_occurrence",
            "domain_resource_alignment": "aligned"
          }
        },
        "alternatives": [
          {
            "resolution": {
              "source_concept": {
                "concept_id": 45576876,
                "vocabulary_id": "ICD10CM",
                "concept_code": "E11.9"
              },
              "standard_concept": {
                "concept_id": 201826,
                "concept_name": "Type 2 diabetes mellitus",
                "domain_id": "Condition"
              },
              "mapping_type": "mapped"
            }
          }
        ],
        "unresolved": []
      }
    }
    ```

    Key things to notice:

    * Both the SNOMED code (already standard) and the ICD-10 code (mapped via `Maps to`) resolve to the same standard concept: `201826`
    * The SNOMED coding wins as `best_match` because SNOMED is the preferred vocabulary for the Condition domain in OHDSI conventions
    * `mapping_type` tells you what happened: `direct` (source was already standard), `mapped` (followed `Maps to`), `semantic_match` (fell back to text-based search), or `unmapped`
    * `target_table` tells you which CDM table the record belongs to - computed from the standard concept's domain, not from the FHIR resource type
    * `domain_resource_alignment` is `aligned` when the FHIR `resource_type` you declared matches the concept's OMOP domain - useful as a sanity signal for mis-coded data
  </Step>

  <Step title="Read the domain assignment">
    The `domain_id` in the standard concept determines which OMOP CDM table the record belongs to. This is critical, and it's vocabulary-driven, not FHIR-resource-driven.

    | domain\_id    | Target OMOP CDM table  |
    | ------------- | ---------------------- |
    | `Condition`   | `condition_occurrence` |
    | `Drug`        | `drug_exposure`        |
    | `Measurement` | `measurement`          |
    | `Observation` | `observation`          |
    | `Procedure`   | `procedure_occurrence` |
    | `Device`      | `device_exposure`      |
    | `Specimen`    | `specimen`             |
    | `Visit`       | `visit_occurrence`     |

    <Warning>
      The FHIR resource type does NOT always determine the OMOP CDM table. A FHIR `Observation` resource carrying a blood glucose measurement (LOINC) maps to the `measurement` table, not `observation`. A FHIR `Condition` resource carrying a lab-derived finding might map to `measurement`. Always use the vocabulary domain (from `standard_concept.domain_id`, or read `resolution.target_table` directly) for table assignment.
    </Warning>
  </Step>

  <Step title="Populate the CDM table row">
    With the standard concept resolved and the target table identified, populate the CDM row:

    ```sql theme={null}
    INSERT INTO condition_occurrence (
      person_id,
      condition_concept_id,        -- 201826 (from standard_concept.concept_id)
      condition_start_date,         -- from FHIR Condition.onsetDateTime
      condition_type_concept_id,    -- 32817 (EHR) or per your convention
      condition_source_value,       -- "44054006" (original source code)
      condition_source_concept_id   -- 45576876 (from source_concept.concept_id)
    ) VALUES (
      :person_id,
      201826,
      :onset_date,
      32817,
      '44054006',
      45576876
    );
    ```
  </Step>
</Steps>

## 3. Working with the Python SDK

The same four-step flow, in Python:

```python theme={null}
import omophub

client = omophub.OMOPHub()

# Step 1: Extract codings from your FHIR resource (you parse the JSON)
fhir_condition = {
    "coding": [
        {"system": "http://snomed.info/sct", "code": "44054006"},
        {"system": "http://hl7.org/fhir/sid/icd-10-cm", "code": "E11.9"},
    ],
}

# Step 2: Resolve
result = client.fhir.resolve_codeable_concept(
    coding=fhir_condition["coding"],
    resource_type="Condition",
)
res = result["best_match"]["resolution"]

# Step 3: Read domain and table assignment
print(f"Standard concept: {res['standard_concept']['concept_id']} ({res['standard_concept']['concept_name']})")
print(f"Domain:           {res['standard_concept']['domain_id']}")
print(f"Target table:     {res['target_table']}")
print(f"Mapping type:     {res['mapping_type']}")

# Step 4: Use the resolved values in your ETL
# insert into the appropriate CDM table with the standard + source concept IDs
```

<Tip>
  If your pipeline already parses FHIR with `fhir.resources` or `fhirpy`, you can pass those `Coding` / `CodeableConcept` objects directly to the resolver via duck typing - see [Type Interoperability](/sdks/python/fhir#type-interoperability) in the Python SDK reference. Neither library is a required dependency.
</Tip>

## 4. Batch Processing: The ETL Pattern

In a real ETL pipeline you're processing thousands of FHIR resources. Don't resolve one at a time - use the batch endpoint. Deduplicate first, then batch-resolve the unique codings in chunks of 100:

```python theme={null}
import omophub
import json

client = omophub.OMOPHub()

# Load a FHIR Bundle (e.g., from a Bulk FHIR export)
with open("fhir_bundle.json") as f:
    bundle = json.load(f)

# Step 1: Extract all unique codings across all resources in the bundle
all_codings = set()
for entry in bundle["entry"]:
    resource = entry["resource"]
    if resource["resourceType"] == "Condition" and "code" in resource:
        for c in resource["code"].get("coding", []):
            all_codings.add((c["system"], c["code"]))
    elif resource["resourceType"] == "MedicationRequest":
        med = resource.get("medicationCodeableConcept", {})
        for c in med.get("coding", []):
            all_codings.add((c["system"], c["code"]))
    # ... handle other resource types

print(f"Total resources: {len(bundle['entry']):,}")
print(f"Unique codings:  {len(all_codings):,}")

# Step 2: Batch resolve unique codings (100 per call)
codings_list = [{"system": s, "code": c} for s, c in all_codings]
cache = {}

for i in range(0, len(codings_list), 100):
    chunk = codings_list[i : i + 100]
    result = client.fhir.resolve_batch(chunk)
    for item in result["results"]:
        if "resolution" in item:
            src = item["resolution"]["source_concept"]
            cache[(src["vocabulary_id"], src["concept_code"])] = item["resolution"]
        else:
            # Failed coding - log for manual review
            print(f"  Failed: {item['error']['code']} - {item['error']['message']}")

# Steps 3-4: Apply the cache to every row in your full dataset
# (pandas merge / SQL JOIN / dict lookup, depending on your pipeline)
```

<Tip>
  The deduplication step is critical. A FHIR Bulk Export with 500,000 `Condition` resources might contain only 2,000 unique diagnosis codes. Map the 2,000, then join against the full dataset. See [Batch & Performance](/guides/production/batch-performance) for the full pattern.
</Tip>

## 5. Using the FHIR R4 Terminology Service

If your pipeline already speaks FHIR (you're integrating with HAPI FHIR or EHRbase, or you're building a spec-conformant client), you can use OMOPHub's FHIR R4 terminology operations instead of the REST resolver:

**`$lookup`** - get concept details for a code:

```bash theme={null}
curl "https://fhir.omophub.com/fhir/r4/CodeSystem/\$lookup?\
system=http://snomed.info/sct&code=44054006" \
  -H "Authorization: Bearer oh_your_api_key"
```

**`$translate`** - map between vocabularies:

```bash theme={null}
curl "https://fhir.omophub.com/fhir/r4/ConceptMap/\$translate?\
system=http://hl7.org/fhir/sid/icd-10-cm&code=E11.9&\
target=http://snomed.info/sct" \
  -H "Authorization: Bearer oh_your_api_key"
```

**`$validate-code`** - check if a code exists:

```bash theme={null}
curl "https://fhir.omophub.com/fhir/r4/CodeSystem/\$validate-code?\
url=http://snomed.info/sct&code=44054006" \
  -H "Authorization: Bearer oh_your_api_key"
```

These return standard FHIR `Parameters` responses and can be consumed directly by FHIR-aware clients. See the [FHIR Terminology Service overview](/api-reference/fhir-terminology/overview) for the full operation reference.

<Note>
  **Concept Resolver vs FHIR Terminology Service - when to use which:**

  Use the **Concept Resolver** (`/v1/fhir/resolve*`) when you want the complete OMOP mapping chain in one call - source concept, standard concept, domain, target CDM table, mapping type. This is purpose-built for ETL pipelines and returns OMOPHub's native JSON envelope.

  Use the **FHIR R4 Terminology Service** (`/fhir/r4/*`) when you're integrating with FHIR infrastructure (HAPI FHIR, EHRbase, Firely) that expects spec-conformant FHIR operations and `OperationOutcome` error responses.

  Both use the same underlying vocabulary data. The difference is response format and how much resolution logic the server hands you in a single call.
</Note>

## 6. Handling Edge Cases

### One-to-many mappings

A single ICD-10 code can map to multiple SNOMED standard concepts (e.g. a combination diagnosis that splits into separate Condition and Observation concepts). The single-coding resolver returns `alternative_standard_concepts` alongside the primary `standard_concept`. For ETL pipelines, inspect that array and decide whether to write one row per alternative or pick the highest-quality match.

### Unmapped codes

If a code doesn't map to any standard concept, the Resolver returns `mapping_type: "unmapped"` with no `standard_concept`. Your pipeline should:

1. Store the source code in the `*_source_value` field
2. Set `*_concept_id` to `0` (OMOP convention for unmapped)
3. Log the unmapped code for manual review - don't silently drop records

### Local / proprietary codes

Hospital-specific codes with custom FHIR system URIs (e.g. `http://hospital.local/codes`) won't be in the OMOP vocabulary tables. Two options:

* **Pass a `display` value with no `system`/`code`.** The Resolver falls back to semantic search over the display text, scoped to the `resource_type` domain, and returns `mapping_type: "semantic_match"` with a `similarity_score`.
* **Pre-map your local codes to standard vocabularies** as part of your site configuration, then send standard codes to the Resolver. See [Collaborative Mapping](/guides/use-cases/collaborative-mapping) for the shared-mapping-file pattern.

## 7. The Complete Pipeline Architecture

```
┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  FHIR Source │     │     OMOPHub      │     │     OMOP CDM     │
│              │     │                  │     │                  │
│ Condition    │     │ 1. Deduplicate   │     │ condition_       │
│ Observation  │     │    unique codes  │     │   occurrence     │
│ Medication   │────▶│                  │────▶│ measurement      │
│ Procedure    │     │ 2. Batch resolve │     │ drug_exposure    │
│ ...          │     │    via Concept   │     │ procedure_       │
│              │     │    Resolver      │     │   occurrence     │
│              │     │                  │     │ observation      │
│              │     │ 3. Cache results │     │ ...              │
│              │     │                  │     │                  │
│              │     │ 4. Apply to full │     │                  │
│              │     │    dataset       │     │                  │
└──────────────┘     └──────────────────┘     └──────────────────┘
```

The first ETL run hits the Resolver most. Every subsequent run hits it less, because the mapping cache grows and only genuinely new codes need resolution. By the third or fourth run, the bottleneck stops being API calls and starts being whatever else your pipeline is doing.

## 8. What to Read Next

<CardGroup cols={2}>
  <Card title="FHIR Integration Cookbook" icon="fire" href="/guides/integration/fhir-integration">
    Resource-by-resource mappings: Condition, Observation, MedicationRequest, Procedure, and more, with the exact Coding extraction logic for each.
  </Card>

  <Card title="FHIR Terminology Service" icon="server" href="/api-reference/fhir-terminology/overview">
    Full operation reference for the FHIR R4 Terminology Service: `$lookup`, `$translate`, `$validate-code`, `$expand`, `$subsumes`, `$find-matches`, `$closure`, `$diff`.
  </Card>

  <Card title="Lean ETL Mapping Cache" icon="database" href="/guides/use-cases/lean-etl-mapping-cache">
    Build validated mapping caches during development and apply them locally at production speed.
  </Card>

  <Card title="Collaborative Mapping" icon="users" href="/guides/use-cases/collaborative-mapping">
    Share mappings across teams via `source_to_concept_map` files.
  </Card>

  <Card title="Batch & Performance" icon="gauge-high" href="/guides/production/batch-performance">
    Deduplication, batch endpoints, cache patterns for ETL at scale.
  </Card>

  <Card title="Known Limitations" icon="triangle-exclamation" href="/guides/production/known-limitations">
    What OMOPHub does not do. FHIR-specific caveats, vocabulary exclusions, and what's on the roadmap.
  </Card>
</CardGroup>
