> ## Documentation Index
> Fetch the complete documentation index at: https://docs.omophub.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Semantic Search

## Overview

Semantic search uses LLM generated embeddings to find OMOP concepts that are semantically similar to your query, even when exact keyword matches don't exist. This is ideal for natural language queries and clinical text processing.

**Example queries:**

* "heart attack" → finds "Myocardial infarction"
* "sugar diabetes" → finds "Type 2 diabetes mellitus"
* "high blood pressure" → finds "Essential hypertension"
* "belly pain" → finds "Abdominal pain"

## When to Use Semantic Search

| Scenario                  | Use Semantic Search | Use Basic Search |
| ------------------------- | ------------------- | ---------------- |
| Natural language queries  | Yes                 | No               |
| Patient-reported symptoms | Yes                 | No               |
| Clinical shorthand/slang  | Yes                 | No               |
| Exact code lookup         | No                  | Yes              |
| Browsing vocabularies     | No                  | Yes              |

## Query Parameters

<ParamField query="query" type="string" required>
  Natural language search query (1-500 characters)
</ParamField>

<ParamField query="page" type="integer" default="1">
  Page number (1-based)
</ParamField>

<ParamField query="page_size" type="integer" default="20">
  Results per page (1-100)
</ParamField>

<ParamField query="threshold" type="number" default="0.5">
  Minimum similarity score (0.0-1.0). Higher values = stricter matching.

  <br />

  **Recommended thresholds:** - `0.7` - Very high confidence matches only -
  `0.5` - Balanced, high precision (default) - `0.3` - More exploratory results
</ParamField>

<ParamField query="vocabulary_ids" type="string">
  Filter to specific vocabularies (comma-separated)

  <br />

  **Examples:** `SNOMED`, `SNOMED,ICD10CM`, `SNOMED,ICD10CM,RXNORM`
</ParamField>

<ParamField query="domain_ids" type="string">
  Filter to specific domains (comma-separated)

  <br />

  **Examples:** `Condition`, `Drug`, `Condition,Drug,Procedure`
</ParamField>

<ParamField query="standard_concept" type="string">
  Filter by standard concept status

  <br />

  **Values:** `S` (Standard), `C` (Classification)
</ParamField>

<ParamField query="vocab_release" type="string">
  Vocabulary version (e.g., `2025v2`). Uses default if not specified.
</ParamField>

<RequestExample>
  ```bash cURL theme={null}
  curl -X GET "https://api.omophub.com/v1/concepts/semantic-search?query=heart%20attack&page_size=5" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  response = requests.get(
      "https://api.omophub.com/v1/concepts/semantic-search",
      params={"query": "heart attack", "page_size": 5},
      headers={"Authorization": "Bearer YOUR_API_KEY"}
  )
  results = response.json()

  for concept in results["data"]["results"]:
      print(f"{concept['similarity_score']:.2f} - {concept['concept_name']}")
  ```

  ```javascript JavaScript (Node.js) theme={null}
  const params = new URLSearchParams({
    query: 'heart attack',
    page_size: '5',
  });

  const response = await fetch(
    `https://api.omophub.com/v1/concepts/semantic-search?${params}`,
    {
      headers: {
        Authorization: `Bearer ${process.env.OMOPHUB_API_KEY}`,
      },
    }
  );
  const results = await response.json();
  ```

  ```bash cURL (with filters) theme={null}
  curl -X GET "https://api.omophub.com/v1/concepts/semantic-search?query=chest%20pain&vocabulary_ids=SNOMED,ICD10CM&domain_ids=Condition&threshold=0.5" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python (with filters) theme={null}
  import requests

  params = {
      "query": "chest pain",
      "vocabulary_ids": "SNOMED,ICD10CM",
      "domain_ids": "Condition",
      "threshold": 0.5,
      "standard_concept": "S"
  }
  response = requests.get(
      "https://api.omophub.com/v1/concepts/semantic-search",
      headers={"Authorization": "Bearer YOUR_API_KEY"},
      params=params
  )
  filtered_results = response.json()
  ```

  ```python Python (pagination) theme={null}
  import requests

  # Get page 2 of results
  params = {
      "query": "diabetes",
      "page": 2,
      "page_size": 20
  }
  response = requests.get(
      "https://api.omophub.com/v1/concepts/semantic-search",
      headers={"Authorization": "Bearer YOUR_API_KEY"},
      params=params
  )
  page_2_results = response.json()
  pagination = page_2_results["meta"]["pagination"]
  print(f"Page {pagination['page']} of {pagination['total_pages']}")
  ```
</RequestExample>

<ResponseExample>
  ```json theme={null}
  {
    "success": true,
    "data": {
      "query": "heart attack",
      "results": [
        {
          "concept_id": 4329847,
          "concept_name": "Myocardial infarction",
          "domain_id": "Condition",
          "vocabulary_id": "SNOMED",
          "concept_class_id": "Clinical Finding",
          "standard_concept": "S",
          "concept_code": "22298006",
          "similarity_score": 0.92,
          "matched_text": "Myocardial infarction"
        },
        {
          "concept_id": 434376,
          "concept_name": "Acute myocardial infarction",
          "domain_id": "Condition",
          "vocabulary_id": "SNOMED",
          "concept_class_id": "Clinical Finding",
          "standard_concept": "S",
          "concept_code": "57054005",
          "similarity_score": 0.89,
          "matched_text": "Acute myocardial infarction"
        },
        {
          "concept_id": 4108217,
          "concept_name": "Old myocardial infarction",
          "domain_id": "Condition",
          "vocabulary_id": "SNOMED",
          "concept_class_id": "Clinical Finding",
          "standard_concept": "S",
          "concept_code": "1755008",
          "similarity_score": 0.85,
          "matched_text": "Old myocardial infarction"
        }
      ],
      "total_results": 45,
      "latency_ms": 28
    },
    "meta": {
      "pagination": {
        "page": 1,
        "page_size": 20,
        "total_items": 45,
        "total_pages": 3,
        "has_next": true,
        "has_previous": false
      },
      "request_id": "req_sem_abc123",
      "vocab_release": "2025v1",
      "timestamp": "2025-01-15T10:30:00Z",
      "search": {
        "query": "heart attack",
        "total_results": 45,
        "filters_applied": {}
      }
    }
  }
  ```
</ResponseExample>

## Response Fields

### Data Object

<ResponseField name="query" type="string">
  Original search query
</ResponseField>

<ResponseField name="results" type="array">
  Array of matching concepts with similarity scores
</ResponseField>

<ResponseField name="total_results" type="integer">
  Approximate number of matching concepts. This is a lower bound based on
  sampled results, not an exact count. Use `has_next` for reliable pagination.
</ResponseField>

<ResponseField name="latency_ms" type="number">
  Processing time in milliseconds
</ResponseField>

### Result Object

<ResponseField name="concept_id" type="integer">
  OMOP concept\_id
</ResponseField>

<ResponseField name="concept_name" type="string">
  Standard concept name
</ResponseField>

<ResponseField name="similarity_score" type="number">
  Semantic similarity score (0.0-1.0). Higher = more similar.
</ResponseField>

<ResponseField name="matched_text" type="string">
  The text that matched (concept name or synonym)
</ResponseField>

<ResponseField name="domain_id" type="string">
  OMOP domain (e.g., Condition, Drug, Procedure)
</ResponseField>

<ResponseField name="vocabulary_id" type="string">
  Source vocabulary (e.g., SNOMED, ICD10CM, RxNorm)
</ResponseField>

<ResponseField name="concept_class_id" type="string">
  Concept classification within the vocabulary
</ResponseField>

<ResponseField name="standard_concept" type="string | null">
  Standard concept flag: S (Standard), C (Classification), or null
</ResponseField>

<ResponseField name="concept_code" type="string">
  Original code from the source vocabulary
</ResponseField>

### Pagination Object (in meta)

<ResponseField name="page" type="integer">
  Current page number
</ResponseField>

<ResponseField name="page_size" type="integer">
  Number of results per page
</ResponseField>

<ResponseField name="total_items" type="integer">
  Approximate total number of matching items (lower bound). Use `has_next` for
  reliable pagination.
</ResponseField>

<ResponseField name="total_pages" type="integer">
  Approximate total number of pages. Use `has_next` to determine if more pages
  exist.
</ResponseField>

<ResponseField name="has_next" type="boolean">
  **Reliable indicator** of whether more results are available. Use this for
  pagination loops.
</ResponseField>

<ResponseField name="has_previous" type="boolean">
  **Reliable indicator** of whether previous pages exist.
</ResponseField>

<Note>
  **Pagination Note:** For performance reasons, `total_items` and `total_pages`
  are approximate values based on sampled results. Always use `has_next` to
  determine if more pages exist rather than comparing `page` to `total_pages`.
</Note>

## How It Works

1. **Query Embedding**: Your query is converted to a 768-dimensional vector using neural embeddings
2. **Vector Search**: The query vector is compared against pre-computed concept embeddings
3. **Ranking**: Results are ranked by cosine similarity score
4. **Filtering**: Optional filters (vocabulary, domain) are applied
5. **Deduplication**: Results are deduplicated by concept\_id (keeping highest score)

## Similarity Score Interpretation

| Score Range | Interpretation                          |
| ----------- | --------------------------------------- |
| 0.9 - 1.0   | Excellent match, high confidence        |
| 0.7 - 0.9   | Good match, likely relevant             |
| 0.5 - 0.7   | Moderate match, review recommended      |
| 0.3 - 0.5   | Weak match, may be tangentially related |
| \< 0.3      | Poor match, likely not relevant         |

## Performance

* **Latency**: \~15-50ms typical
* **Throughput**: \~100 requests/second

## Use Cases

### 1. Natural Language Processing

Process patient-reported symptoms and clinical notes:

```python theme={null}
# Patient says: "I've been having trouble breathing"
results = client.semantic_search(query="trouble breathing")
# Returns: Dyspnea, Shortness of breath, Respiratory distress
```

### 2. Clinical Decision Support

Map clinical observations to standard codes:

```python theme={null}
# Nurse notes: "pt appears confused and agitated"
results = client.semantic_search(
    query="confused and agitated",
    domain_ids="Condition"
)
# Returns: Delirium, Acute confusional state, Agitation
```

### 3. Code Mapping Assistance

Find mappings for non-standard terminology:

```python theme={null}
# Legacy code description: "DM2 uncontrolled"
results = client.semantic_search(
    query="DM2 uncontrolled",
    vocabulary_ids="SNOMED",
    standard_concept="S"
)
# Returns: Type 2 diabetes mellitus without complications
```

### 4. Paginating Through Results

Iterate through large result sets:

```python theme={null}
page = 1
while True:
    results = client.semantic_search(
        query="diabetes",
        page=page,
        page_size=50
    )

    # Process results
    for concept in results["data"]["results"]:
        print(concept["concept_name"])

    # Check if there are more pages
    if not results["meta"]["pagination"]["has_next"]:
        break

    page += 1
```

## Related Endpoints

* [Basic Concept Search](/api-reference/search/basic-search) - Keyword-based search
* [Similar Concepts](/api-reference/search/search-similar) - Find concepts similar to a given concept (supports `algorithm: "semantic"` to use the same embedding model)
* [Autocomplete](/api-reference/search/search-autocomplete) - Type-ahead suggestions

<Note>
  **Tip:** The `/search/similar` endpoint also supports semantic search via the
  `algorithm: "semantic"` parameter. Use that endpoint when you need additional
  features like starting from a `concept_id` or getting detailed similarity
  explanations.
</Note>
