Skip to main content

Overview

Semantic search uses LLM generated embeddings to find OMOP concepts that are semantically similar to your query, even when exact keyword matches don’t exist. This is ideal for natural language queries and clinical text processing. Example queries:
  • “heart attack” → finds “Myocardial infarction”
  • “sugar diabetes” → finds “Type 2 diabetes mellitus”
  • “high blood pressure” → finds “Essential hypertension”
  • “belly pain” → finds “Abdominal pain”
ScenarioUse Semantic SearchUse Basic Search
Natural language queriesYesNo
Patient-reported symptomsYesNo
Clinical shorthand/slangYesNo
Exact code lookupNoYes
Browsing vocabulariesNoYes

Query Parameters

query
string
required
Natural language search query (1-500 characters)
page
integer
default:"1"
Page number (1-based)
page_size
integer
default:"20"
Results per page (1-100)
threshold
number
default:"0.5"
Minimum similarity score (0.0-1.0). Higher values = stricter matching.
Recommended thresholds: - 0.7 - Very high confidence matches only - 0.5 - Balanced, high precision (default) - 0.3 - More exploratory results
vocabulary_ids
string
Filter to specific vocabularies (comma-separated)
Examples: SNOMED, SNOMED,ICD10CM, SNOMED,ICD10CM,RXNORM
domain_ids
string
Filter to specific domains (comma-separated)
Examples: Condition, Drug, Condition,Drug,Procedure
standard_concept
string
Filter by standard concept status
Values: S (Standard), C (Classification)
vocab_release
string
Vocabulary version (e.g., 2025v2). Uses default if not specified.
curl -X GET "https://api.omophub.com/v1/concepts/semantic-search?query=heart%20attack&page_size=5" \
  -H "Authorization: Bearer YOUR_API_KEY"
{
  "success": true,
  "data": {
    "query": "heart attack",
    "results": [
      {
        "concept_id": 4329847,
        "concept_name": "Myocardial infarction",
        "domain_id": "Condition",
        "vocabulary_id": "SNOMED",
        "concept_class_id": "Clinical Finding",
        "standard_concept": "S",
        "concept_code": "22298006",
        "similarity_score": 0.92,
        "matched_text": "Myocardial infarction"
      },
      {
        "concept_id": 434376,
        "concept_name": "Acute myocardial infarction",
        "domain_id": "Condition",
        "vocabulary_id": "SNOMED",
        "concept_class_id": "Clinical Finding",
        "standard_concept": "S",
        "concept_code": "57054005",
        "similarity_score": 0.89,
        "matched_text": "Acute myocardial infarction"
      },
      {
        "concept_id": 4108217,
        "concept_name": "Old myocardial infarction",
        "domain_id": "Condition",
        "vocabulary_id": "SNOMED",
        "concept_class_id": "Clinical Finding",
        "standard_concept": "S",
        "concept_code": "1755008",
        "similarity_score": 0.85,
        "matched_text": "Old myocardial infarction"
      }
    ],
    "total_results": 45,
    "latency_ms": 28
  },
  "meta": {
    "pagination": {
      "page": 1,
      "page_size": 20,
      "total_items": 45,
      "total_pages": 3,
      "has_next": true,
      "has_previous": false
    },
    "request_id": "req_sem_abc123",
    "vocab_release": "2025v1",
    "timestamp": "2025-01-15T10:30:00Z",
    "search": {
      "query": "heart attack",
      "total_results": 45,
      "filters_applied": {}
    }
  }
}

Response Fields

Data Object

query
string
Original search query
results
array
Array of matching concepts with similarity scores
total_results
integer
Approximate number of matching concepts. This is a lower bound based on sampled results, not an exact count. Use has_next for reliable pagination.
latency_ms
number
Processing time in milliseconds

Result Object

concept_id
integer
OMOP concept_id
concept_name
string
Standard concept name
similarity_score
number
Semantic similarity score (0.0-1.0). Higher = more similar.
matched_text
string
The text that matched (concept name or synonym)
domain_id
string
OMOP domain (e.g., Condition, Drug, Procedure)
vocabulary_id
string
Source vocabulary (e.g., SNOMED, ICD10CM, RxNorm)
concept_class_id
string
Concept classification within the vocabulary
standard_concept
string | null
Standard concept flag: S (Standard), C (Classification), or null
concept_code
string
Original code from the source vocabulary

Pagination Object (in meta)

page
integer
Current page number
page_size
integer
Number of results per page
total_items
integer
Approximate total number of matching items (lower bound). Use has_next for reliable pagination.
total_pages
integer
Approximate total number of pages. Use has_next to determine if more pages exist.
has_next
boolean
Reliable indicator of whether more results are available. Use this for pagination loops.
has_previous
boolean
Reliable indicator of whether previous pages exist.
Pagination Note: For performance reasons, total_items and total_pages are approximate values based on sampled results. Always use has_next to determine if more pages exist rather than comparing page to total_pages.

How It Works

  1. Query Embedding: Your query is converted to a 768-dimensional vector using neural embeddings
  2. Vector Search: The query vector is compared against pre-computed concept embeddings
  3. Ranking: Results are ranked by cosine similarity score
  4. Filtering: Optional filters (vocabulary, domain) are applied
  5. Deduplication: Results are deduplicated by concept_id (keeping highest score)

Similarity Score Interpretation

Score RangeInterpretation
0.9 - 1.0Excellent match, high confidence
0.7 - 0.9Good match, likely relevant
0.5 - 0.7Moderate match, review recommended
0.3 - 0.5Weak match, may be tangentially related
< 0.3Poor match, likely not relevant

Performance

  • Latency: ~15-50ms typical
  • Throughput: ~100 requests/second

Use Cases

1. Natural Language Processing

Process patient-reported symptoms and clinical notes:
# Patient says: "I've been having trouble breathing"
results = client.semantic_search(query="trouble breathing")
# Returns: Dyspnea, Shortness of breath, Respiratory distress

2. Clinical Decision Support

Map clinical observations to standard codes:
# Nurse notes: "pt appears confused and agitated"
results = client.semantic_search(
    query="confused and agitated",
    domain_ids="Condition"
)
# Returns: Delirium, Acute confusional state, Agitation

3. Code Mapping Assistance

Find mappings for non-standard terminology:
# Legacy code description: "DM2 uncontrolled"
results = client.semantic_search(
    query="DM2 uncontrolled",
    vocabulary_ids="SNOMED",
    standard_concept="S"
)
# Returns: Type 2 diabetes mellitus without complications

4. Paginating Through Results

Iterate through large result sets:
page = 1
while True:
    results = client.semantic_search(
        query="diabetes",
        page=page,
        page_size=50
    )

    # Process results
    for concept in results["data"]["results"]:
        print(concept["concept_name"])

    # Check if there are more pages
    if not results["meta"]["pagination"]["has_next"]:
        break

    page += 1
Tip: The /search/similar endpoint also supports semantic search via the algorithm: "semantic" parameter. Use that endpoint when you need additional features like starting from a concept_id or getting detailed similarity explanations.