> ## Documentation Index > Fetch the complete documentation index at: https://docs.omophub.com/llms.txt > Use this file to discover all available pages before exploring further. # Semantic Search ## Overview Semantic search uses LLM generated embeddings to find OMOP concepts that are semantically similar to your query, even when exact keyword matches don't exist. This is ideal for natural language queries and clinical text processing. **Example queries:** * "heart attack" → finds "Myocardial infarction" * "sugar diabetes" → finds "Type 2 diabetes mellitus" * "high blood pressure" → finds "Essential hypertension" * "belly pain" → finds "Abdominal pain" ## When to Use Semantic Search | Scenario | Use Semantic Search | Use Basic Search | | ------------------------- | ------------------- | ---------------- | | Natural language queries | Yes | No | | Patient-reported symptoms | Yes | No | | Clinical shorthand/slang | Yes | No | | Exact code lookup | No | Yes | | Browsing vocabularies | No | Yes | ## Query Parameters Natural language search query (1-500 characters) Page number (1-based) Results per page (1-100) Minimum similarity score (0.0-1.0). Higher values = stricter matching.
**Recommended thresholds:** - `0.7` - Very high confidence matches only - `0.5` - Balanced, high precision (default) - `0.3` - More exploratory results Filter to specific vocabularies (comma-separated)
**Examples:** `SNOMED`, `SNOMED,ICD10CM`, `SNOMED,ICD10CM,RXNORM` Filter to specific domains (comma-separated)
**Examples:** `Condition`, `Drug`, `Condition,Drug,Procedure` Filter by standard concept status
**Values:** `S` (Standard), `C` (Classification) Vocabulary version (e.g., `2025v2`). Uses default if not specified. ```bash cURL theme={null} curl -X GET "https://api.omophub.com/v1/search/semantic?query=heart%20attack&page_size=5" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python theme={null} import requests response = requests.get( "https://api.omophub.com/v1/search/semantic", params={"query": "heart attack", "page_size": 5}, headers={"Authorization": "Bearer YOUR_API_KEY"} ) results = response.json() for concept in results["data"]["results"]: print(f"{concept['similarity_score']:.2f} - {concept['concept_name']}") ``` ```ts TypeScript theme={null} import { OMOPHub } from '@omophub/omophub-node'; const client = new OMOPHub(); const { data: results } = await client.search.semantic('heart attack', { pageSize: 5 }); ``` ```bash cURL (with filters) theme={null} curl -X GET "https://api.omophub.com/v1/search/semantic?query=chest%20pain&vocabulary_ids=SNOMED,ICD10CM&domain_ids=Condition&threshold=0.5" \ -H "Authorization: Bearer YOUR_API_KEY" ``` ```python Python (with filters) theme={null} import requests params = { "query": "chest pain", "vocabulary_ids": "SNOMED,ICD10CM", "domain_ids": "Condition", "threshold": 0.5, "standard_concept": "S" } response = requests.get( "https://api.omophub.com/v1/search/semantic", headers={"Authorization": "Bearer YOUR_API_KEY"}, params=params ) filtered_results = response.json() ``` ```python Python (pagination) theme={null} import requests # Get page 2 of results params = { "query": "diabetes", "page": 2, "page_size": 20 } response = requests.get( "https://api.omophub.com/v1/search/semantic", headers={"Authorization": "Bearer YOUR_API_KEY"}, params=params ) page_2_results = response.json() pagination = page_2_results["meta"]["pagination"] print(f"Page {pagination['page']} of {pagination['total_pages']}") ``` ```json theme={null} { "success": true, "data": { "query": "heart attack", "results": [ { "concept_id": 4329847, "concept_name": "Myocardial infarction", "domain_id": "Condition", "vocabulary_id": "SNOMED", "concept_class_id": "Clinical Finding", "standard_concept": "S", "concept_code": "22298006", "similarity_score": 0.92, "matched_text": "Myocardial infarction" }, { "concept_id": 434376, "concept_name": "Acute myocardial infarction", "domain_id": "Condition", "vocabulary_id": "SNOMED", "concept_class_id": "Clinical Finding", "standard_concept": "S", "concept_code": "57054005", "similarity_score": 0.89, "matched_text": "Acute myocardial infarction" }, { "concept_id": 4108217, "concept_name": "Old myocardial infarction", "domain_id": "Condition", "vocabulary_id": "SNOMED", "concept_class_id": "Clinical Finding", "standard_concept": "S", "concept_code": "1755008", "similarity_score": 0.85, "matched_text": "Old myocardial infarction" } ], "total_results": 45, "latency_ms": 28 }, "meta": { "pagination": { "page": 1, "page_size": 20, "total_items": 45, "total_pages": 3, "has_next": true, "has_previous": false }, "request_id": "req_sem_abc123", "vocab_release": "2025v1", "timestamp": "2025-01-15T10:30:00Z", "search": { "query": "heart attack", "total_results": 45, "filters_applied": {} } } } ``` ## Response Fields ### Data Object Original search query Array of matching concepts with similarity scores Approximate number of matching concepts. This is a lower bound based on sampled results, not an exact count. Use `has_next` for reliable pagination. Processing time in milliseconds ### Result Object OMOP concept\_id Standard concept name Semantic similarity score (0.0-1.0). Higher = more similar. The text that matched (concept name or synonym) OMOP domain (e.g., Condition, Drug, Procedure) Source vocabulary (e.g., SNOMED, ICD10CM, RxNorm) Concept classification within the vocabulary Standard concept flag: S (Standard), C (Classification), or null Original code from the source vocabulary ### Pagination Object (in meta) Current page number Number of results per page Approximate total number of matching items (lower bound). Use `has_next` for reliable pagination. Approximate total number of pages. Use `has_next` to determine if more pages exist. **Reliable indicator** of whether more results are available. Use this for pagination loops. **Reliable indicator** of whether previous pages exist. **Pagination Note:** For performance reasons, `total_items` and `total_pages` are approximate values based on sampled results. Always use `has_next` to determine if more pages exist rather than comparing `page` to `total_pages`. ## How It Works 1. **Query Embedding**: Your query is converted to a 768-dimensional vector using neural embeddings 2. **Vector Search**: The query vector is compared against pre-computed concept embeddings 3. **Ranking**: Results are ranked by cosine similarity score 4. **Filtering**: Optional filters (vocabulary, domain) are applied 5. **Deduplication**: Results are deduplicated by concept\_id (keeping highest score) ## Similarity Score Interpretation | Score Range | Interpretation | | ----------- | --------------------------------------- | | 0.9 - 1.0 | Excellent match, high confidence | | 0.7 - 0.9 | Good match, likely relevant | | 0.5 - 0.7 | Moderate match, review recommended | | 0.3 - 0.5 | Weak match, may be tangentially related | | \< 0.3 | Poor match, likely not relevant | ## Performance * **Latency**: \~15-50ms typical * **Throughput**: \~100 requests/second ## Use Cases ### 1. Natural Language Processing Process patient-reported symptoms and clinical notes: ```python theme={null} # Patient says: "I've been having trouble breathing" results = client.semantic_search(query="trouble breathing") # Returns: Dyspnea, Shortness of breath, Respiratory distress ``` ### 2. Clinical Decision Support Map clinical observations to standard codes: ```python theme={null} # Nurse notes: "pt appears confused and agitated" results = client.semantic_search( query="confused and agitated", domain_ids="Condition" ) # Returns: Delirium, Acute confusional state, Agitation ``` ### 3. Code Mapping Assistance Find mappings for non-standard terminology: ```python theme={null} # Legacy code description: "DM2 uncontrolled" results = client.semantic_search( query="DM2 uncontrolled", vocabulary_ids="SNOMED", standard_concept="S" ) # Returns: Type 2 diabetes mellitus without complications ``` ### 4. Paginating Through Results Iterate through large result sets: ```python theme={null} page = 1 while True: results = client.semantic_search( query="diabetes", page=page, page_size=50 ) # Process results for concept in results["data"]["results"]: print(concept["concept_name"]) # Check if there are more pages if not results["meta"]["pagination"]["has_next"]: break page += 1 ``` ## Related Endpoints * [Basic Concept Search](/api-reference/search/basic-search) - Keyword-based search * [Similar Concepts](/api-reference/search/search-similar) - Find concepts similar to a given concept (supports `algorithm: "semantic"` to use the same embedding model) * [Autocomplete](/api-reference/search/search-autocomplete) - Type-ahead suggestions **Tip:** The `/search/similar` endpoint also supports semantic search via the `algorithm: "semantic"` parameter. Use that endpoint when you need additional features like starting from a `concept_id` or getting detailed similarity explanations.