> ## Documentation Index
> Fetch the complete documentation index at: https://docs.omophub.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Why OMOPHub vs Self-Hosting

> An honest comparison of using OMOPHub versus downloading ATHENA and running your own OMOP vocabulary database, with guidance on when each choice fits.

Every OMOP team faces this question: should we download ATHENA and run our own vocabulary database, or use an API? This page gives you an honest comparison so you can decide what fits your situation.

## 1. The Self-Hosting Path

The traditional approach:

1. Go to [athena.ohdsi.org](https://athena.ohdsi.org) and request a vocabulary download
2. Wait for the download link (can take hours to days)
3. Download 3–5 GB of CSV files
4. Set up a PostgreSQL database with the OMOP vocabulary schema
5. Load the CSVs (typically 30–60 minutes depending on hardware)
6. Build indexes for acceptable query performance (another 30–60 minutes)
7. Write SQL queries or build a service layer on top
8. Repeat steps 1–6 every time OHDSI publishes a new release

This works. Thousands of OHDSI sites run this way. But it has real costs.

## 2. Where Self-Hosting Gets Expensive

**Setup time is not zero.** A senior data engineer typically spends 1–2 days on initial setup, including schema creation, CSV loading, index tuning, and basic query testing. For teams new to OMOP, it can take a week.

**Maintenance is ongoing.** ATHENA publishes vocabulary updates every 6 months. Each update means re-downloading, re-loading, re-indexing, and regression testing. Teams that skip updates end up with stale vocabularies - deprecated concepts, missing new codes, broken mappings. See [Vocabulary Lifecycle Management](/guides/use-cases/vocabulary-lifecycle-management) for the pattern to stay current.

**No search out of the box.** ATHENA CSVs give you tables, not a search engine. Building fuzzy search, autocomplete, or semantic similarity requires additional tooling - Elasticsearch, custom indexing, neural embedding models. Most teams never build this, so they're stuck with exact-match SQL queries.

**No API without building one.** If your ETL scripts, FHIR server, LLM pipeline, or frontend application need vocabulary access, you have to build and maintain a REST API on top of your database. That's a web framework, auth, rate limiting, caching, monitoring, and deployment - for every team, from scratch.

**Scales with your team, not your problem.** Every new developer, every new project, every new environment needs access to the vocabulary database. That means either shared database access (operationally risky) or multiple copies (expensive and prone to version drift).

## 3. What OMOPHub Gives You Instead

| Capability                                        | Self-hosted ATHENA                         | OMOPHub                                                                                                            |
| ------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------ |
| Setup time                                        | 1–2 days                                   | 5 minutes (get an API key)                                                                                         |
| Vocabulary updates                                | Manual re-download and re-load             | Automatic, synced with ATHENA releases                                                                             |
| Full-text search                                  | Build your own                             | Built-in                                                                                                           |
| Semantic search                                   | Build your own (need an embedding model)   | Built-in (neural embeddings)                                                                                       |
| Autocomplete                                      | Build your own                             | Built-in                                                                                                           |
| REST API                                          | Build your own                             | Built-in                                                                                                           |
| Python SDK                                        | Build your own                             | `pip install omophub`                                                                                              |
| R SDK                                             | Build your own                             | `install.packages("omophub")`                                                                                      |
| MCP Server for AI agents                          | Build your own                             | `npx -y @omophub/omophub-mcp`                                                                                      |
| FHIR Terminology Service                          | Build your own or deploy Echidna/Snowstorm | Built-in (`$lookup`, `$translate`, `$validate-code`, `$expand`, `$subsumes`, `$find-matches`, `$closure`, `$diff`) |
| FHIR Concept Resolver (Coding → OMOP + CDM table) | Not a standard OHDSI tool; build your own  | Built-in (`POST /v1/fhir/resolve`)                                                                                 |
| Batch operations                                  | SQL                                        | Built-in batch endpoints - see [Batch & Performance](/guides/production/batch-performance)                         |
| Phoebe recommendations                            | Requires separate setup                    | Built-in via `property=recommended` on `$lookup`                                                                   |
| Infrastructure cost                               | \$150–400/month (database + compute)       | Free tier available; paid tiers for higher volume                                                                  |
| Maintenance burden                                | Ongoing                                    | Zero                                                                                                               |

## 4. When Self-Hosting Still Makes Sense

OMOPHub is not the right choice for every situation:

* **Air-gapped environments** where no external API calls are permitted. Though the [Lean ETL Mapping Cache](/guides/use-cases/lean-etl-mapping-cache) guide shows a hybrid approach - use OMOPHub during development, cache the results, deploy locally.
* **Custom vocabulary extensions** where you've added proprietary concepts to your local OMOP vocabulary tables. OMOPHub serves standard ATHENA content only.
* **Extremely high volume** workloads that exceed API rate limits and where latency requirements demand sub-millisecond local lookups. For most ETL workloads, the batch endpoints and caching strategies in [Batch & Performance](/guides/production/batch-performance) handle this comfortably.
* **Regulatory requirements** that explicitly prohibit sending vocabulary queries to an external service, even when no PHI is involved. See [Security & Data Handling](/guides/production/security-data-handling) for what actually flows through the API - spoiler: vocabulary codes, not patient data.

## 5. The Hybrid Approach

Many teams use both. OMOPHub for development, exploration, and ETL building - with a local vocabulary cache for production execution. The [Lean ETL Mapping Cache](/guides/use-cases/lean-etl-mapping-cache) guide walks through this pattern in detail.

This gives you the best of both worlds: fast iteration with OMOPHub's search and mapping capabilities during development, and zero external dependencies in production.
