Overview
Semantic search uses LLM generated embeddings to find OMOP concepts that are semantically similar to your query, even when exact keyword matches don’t exist. This is ideal for natural language queries and clinical text processing. Example queries:- “heart attack” → finds “Myocardial infarction”
- “sugar diabetes” → finds “Type 2 diabetes mellitus”
- “high blood pressure” → finds “Essential hypertension”
- “belly pain” → finds “Abdominal pain”
When to Use Semantic Search
| Scenario | Use Semantic Search | Use Basic Search |
|---|---|---|
| Natural language queries | Yes | No |
| Patient-reported symptoms | Yes | No |
| Clinical shorthand/slang | Yes | No |
| Exact code lookup | No | Yes |
| Browsing vocabularies | No | Yes |
Query Parameters
Natural language search query (1-500 characters)
Page number (1-based)
Results per page (1-100)
Minimum similarity score (0.0-1.0). Higher values = stricter matching.
Recommended thresholds: -
Recommended thresholds: -
0.7 - Very high confidence matches only -
0.5 - Balanced, high precision (default) - 0.3 - More exploratory resultsFilter to specific vocabularies (comma-separated)
Examples:
Examples:
SNOMED, SNOMED,ICD10CM, SNOMED,ICD10CM,RXNORMFilter to specific domains (comma-separated)
Examples:
Examples:
Condition, Drug, Condition,Drug,ProcedureFilter by standard concept status
Values:
Values:
S (Standard), C (Classification)Vocabulary version (e.g.,
2025v2). Uses default if not specified.Response Fields
Data Object
Original search query
Array of matching concepts with similarity scores
Approximate number of matching concepts. This is a lower bound based on
sampled results, not an exact count. Use
has_next for reliable pagination.Processing time in milliseconds
Result Object
OMOP concept_id
Standard concept name
Semantic similarity score (0.0-1.0). Higher = more similar.
The text that matched (concept name or synonym)
OMOP domain (e.g., Condition, Drug, Procedure)
Source vocabulary (e.g., SNOMED, ICD10CM, RxNorm)
Concept classification within the vocabulary
Standard concept flag: S (Standard), C (Classification), or null
Original code from the source vocabulary
Pagination Object (in meta)
Current page number
Number of results per page
Approximate total number of matching items (lower bound). Use
has_next for
reliable pagination.Approximate total number of pages. Use
has_next to determine if more pages
exist.Reliable indicator of whether more results are available. Use this for
pagination loops.
Reliable indicator of whether previous pages exist.
Pagination Note: For performance reasons,
total_items and total_pages
are approximate values based on sampled results. Always use has_next to
determine if more pages exist rather than comparing page to total_pages.How It Works
- Query Embedding: Your query is converted to a 768-dimensional vector using neural embeddings
- Vector Search: The query vector is compared against pre-computed concept embeddings
- Ranking: Results are ranked by cosine similarity score
- Filtering: Optional filters (vocabulary, domain) are applied
- Deduplication: Results are deduplicated by concept_id (keeping highest score)
Similarity Score Interpretation
| Score Range | Interpretation |
|---|---|
| 0.9 - 1.0 | Excellent match, high confidence |
| 0.7 - 0.9 | Good match, likely relevant |
| 0.5 - 0.7 | Moderate match, review recommended |
| 0.3 - 0.5 | Weak match, may be tangentially related |
| < 0.3 | Poor match, likely not relevant |
Performance
- Latency: ~15-50ms typical
- Throughput: ~100 requests/second
Use Cases
1. Natural Language Processing
Process patient-reported symptoms and clinical notes:2. Clinical Decision Support
Map clinical observations to standard codes:3. Code Mapping Assistance
Find mappings for non-standard terminology:4. Paginating Through Results
Iterate through large result sets:Related Endpoints
- Basic Concept Search - Keyword-based search
- Similar Concepts - Find concepts similar to a given concept (supports
algorithm: "semantic"to use the same embedding model) - Autocomplete - Type-ahead suggestions
Tip: The
/search/similar endpoint also supports semantic search via the
algorithm: "semantic" parameter. Use that endpoint when you need additional
features like starting from a concept_id or getting detailed similarity
explanations.