What is Microsoft Azure?

Microsoft Azure is a comprehensive cloud computing platform offering 200+ services including compute, storage, AI, databases, networking, and DevOps tools for building and managing enterprise applications.

How does Azure pricing work?

Azure uses pay-as-you-go pricing with options for reserved instances (1-3 year commitments) for significant discounts. Azure Cost Management helps monitor and optimize cloud spending.

Is Azure secure for enterprise workloads?

Yes. Azure has 90+ compliance certifications, operates in 60+ regions, and provides enterprise security features including network isolation, encryption, identity management, and threat detection.

Can Azure work with existing on-premises infrastructure?

Yes. Azure supports hybrid scenarios through Azure Arc, VPN gateways, ExpressRoute, and Azure Stack — allowing organizations to extend on-premises environments into the cloud seamlessly.

How do you help with Azure adoption?

We provide cloud readiness assessments, migration planning, architecture design, implementation, optimization, and ongoing managed services to ensure successful Azure adoption with measurable outcomes.

Vector Search vs Semantic Search: A Technical Comparison

Why Search Architecture Matters for AI

Search is the unsung foundation of modern AI applications. Every RAG (Retrieval-Augmented Generation) system, every knowledge assistant, every AI-powered search experience depends on finding the right information from large document collections. The quality of your search directly determines the quality of your AI’s responses.

Two terms dominate the conversation: vector search and semantic search. They are related but not identical, and understanding the distinction is critical for making sound architectural decisions.

Defining the Terms

Keyword Search (The Baseline)

Before discussing vector and semantic search, it helps to understand what they improve upon. Traditional keyword search (also called lexical or full-text search) works by:

Tokenizing documents into individual terms
Building an inverted index that maps each term to the documents containing it
Scoring relevance using algorithms like BM25, which consider term frequency, document frequency, and document length

Keyword search is fast, well-understood, and effective when users know the exact terminology. It fails when:

The user uses different words than the document (synonyms, paraphrasing)
The query expresses a concept rather than specific terms
The relevant document discusses the topic without using the query terms

Example: Searching for “how to reduce cloud costs” would miss a document titled “Azure cost optimization strategies” because the exact terms do not overlap sufficiently.

Vector Search

Vector search represents documents and queries as high-dimensional numerical vectors (embeddings) and finds the most similar documents by measuring the mathematical distance between vectors.

How it works:

Embedding generation — a neural network (e.g., text-embedding-3-large) converts text into a dense vector of 256-3072 floating-point numbers. This vector encodes the semantic meaning of the text, not just its keywords.
Indexing — vectors are stored in a vector index optimized for similarity search. Common algorithms include:
- HNSW (Hierarchical Navigable Small World) — graph-based index with excellent query performance and recall. Used by Azure AI Search, pgvector, and most modern vector databases.
- IVF (Inverted File Index) — partition-based index that trades some recall for lower memory usage. Used by FAISS and some cloud databases.
- Flat (brute force) — compares every vector. Perfect recall but does not scale beyond small collections.
Query processing — the user’s query is converted to a vector using the same embedding model, then the index finds the nearest vectors using a distance metric:
- Cosine similarity — measures the angle between vectors (most common for text)
- Euclidean distance (L2) — measures straight-line distance in vector space
- Dot product — efficient alternative to cosine when vectors are normalized

Strengths of vector search:

Finds semantically similar content regardless of exact word overlap
Handles synonyms, paraphrasing, and conceptual queries naturally
Works across languages (multilingual embedding models)
Scales to millions of documents with HNSW indexes

Limitations:

Embedding quality depends on the model and the domain
Cannot perform exact term matching (important for names, codes, identifiers)
Vector indexes consume significant memory (1M documents with 1536-dim vectors requires ~6 GB)
No built-in understanding of document structure, dates, or metadata

Semantic Search

Semantic search is a broader concept: any search approach that understands the meaning of queries and documents, not just their keywords. Vector search is one implementation of semantic search, but not the only one.

In the context of Azure AI Search and similar platforms, “semantic search” typically refers to a semantic ranker — a transformer-based model that re-ranks initial search results for relevance:

A traditional keyword search (BM25) retrieves an initial candidate set (e.g., top 50 results)
The semantic ranker reads each candidate document and the query
It assigns a semantic relevance score based on deep language understanding
Results are re-ordered by this score

This approach is different from pure vector search in important ways:

Aspect	Vector Search	Semantic Ranker
Stage	Retrieval (finding candidates)	Re-ranking (ordering candidates)
Input	Pre-computed embeddings	Raw text at query time
Processing	Nearest-neighbor lookup (fast)	Transformer inference per result (slower)
Scale	Searches millions of documents	Re-ranks 50-200 candidates
Cost	Embedding computation at index time	Per-query inference cost

The semantic ranker provides deeper understanding of relevance but can only work with documents that were already retrieved. It cannot find documents that the initial retrieval stage missed.

Hybrid Search: The Best of Both Worlds

In practice, the strongest search architectures combine multiple approaches:

Architecture Pattern: Hybrid Retrieval with Semantic Re-Ranking

User Query
    ↓
┌─────────────────────────────────────┐
│  Stage 1: Parallel Retrieval        │
│  ├── BM25 keyword search → Top 50   │
│  └── Vector search → Top 50         │
│  Merge results (RRF fusion)         │
│  → Combined Top 50                  │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│  Stage 2: Semantic Re-Ranking       │
│  Transformer re-scores top 50       │
│  → Re-ordered Top 10                │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│  Stage 3: LLM Generation           │
│  Top 5 results used as context      │
│  → Generated answer with citations  │
└─────────────────────────────────────┘

Reciprocal Rank Fusion (RRF) is the most common method for combining keyword and vector results. It assigns a score to each result based on its rank in each individual result list, then sorts by combined score. This ensures that documents ranked highly by both methods appear at the top.

Why Hybrid Outperforms Either Approach Alone

Research from Microsoft and academic institutions consistently shows that hybrid search outperforms pure keyword or pure vector search:

Keyword search catches exact matches that vector search may miss — product codes, person names, technical identifiers
Vector search catches semantic matches that keyword search misses — paraphrased content, conceptual queries, cross-lingual queries
The semantic ranker catches nuance that neither retrieval method handles well — negation, qualification, context-dependent meaning

In RAG applications, hybrid search with semantic re-ranking typically achieves 15-30% higher answer quality compared to vector-only or keyword-only approaches.

Implementation on Azure AI Search

Azure AI Search supports all three approaches natively:

Keyword Search

Built-in BM25 scoring over text fields with analyzers for language-specific tokenization, stemming, and stop word removal.

Vector Search

HNSW index over vector fields, supporting multiple embedding dimensions, distance metrics, and filtering:

{
  "name": "my-index",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true },
    { "name": "content", "type": "Edm.String", "searchable": true },
    { "name": "contentVector", "type": "Collection(Edm.Single)",
      "dimensions": 1536,
      "vectorSearchProfile": "my-vector-profile" },
    { "name": "category", "type": "Edm.String", "filterable": true }
  ],
  "vectorSearch": {
    "algorithms": [
      { "name": "my-hnsw", "kind": "hnsw",
        "hnswParameters": { "m": 4, "efConstruction": 400, "efSearch": 500 } }
    ],
    "profiles": [
      { "name": "my-vector-profile", "algorithmConfigurationName": "my-hnsw" }
    ]
  }
}

Semantic Ranker

Enable the semantic configuration to re-rank results:

{
  "semanticConfiguration": {
    "name": "my-semantic-config",
    "prioritizedFields": {
      "titleField": { "fieldName": "title" },
      "contentFields": [{ "fieldName": "content" }]
    }
  }
}

Hybrid Query

Combine all three in a single query:

{
  "search": "how to reduce cloud infrastructure costs",
  "vectorQueries": [{
    "vector": [0.012, -0.034, ...],
    "fields": "contentVector",
    "k": 50
  }],
  "queryType": "semantic",
  "semanticConfiguration": "my-semantic-config",
  "top": 10
}

Embedding Model Selection

The choice of embedding model significantly affects search quality:

Model	Dimensions	Max Tokens	Strengths
text-embedding-3-large	3072 (configurable)	8191	Highest quality, dimension reduction supported
text-embedding-3-small	1536 (configurable)	8191	Good quality at lower cost and memory
text-embedding-ada-002	1536	8191	Legacy, still widely used
Cohere embed-v3	1024	512	Strong multilingual performance
E5-large-v2 (open source)	1024	512	Self-hosted, no API cost

Key considerations:

Use the same model for indexing and querying — vectors from different models are not compatible
Higher dimensions generally provide better quality but consume more memory and storage
text-embedding-3-large supports dimension reduction (e.g., output 256 dims instead of 3072) with minimal quality loss, saving significant storage
Domain-specific fine-tuning can improve quality for specialized vocabularies (medical, legal, financial)

Chunking Strategies

How you split documents into chunks before embedding determines retrieval quality:

Fixed-Size Chunking

Split text every N tokens with overlap:

Simple and predictable
May split mid-sentence or mid-paragraph
Works well for uniform, unstructured text

Structure-Aware Chunking

Split at document structure boundaries (headings, sections, paragraphs):

Preserves logical units of information
Produces variable-size chunks
Requires document structure parsing (Markdown headers, HTML tags, PDF sections)

Recursive Chunking

Split at the largest structure boundary that fits within the size limit, then recursively split if needed:

Balances structure awareness with size constraints
Adapts to documents with inconsistent structure

Parent-Child Chunking

Index small chunks for retrieval but return the parent section for context:

Small chunks (200 tokens) for precise retrieval
Return the parent chunk (1000+ tokens) to the LLM for richer context
Supported natively in Azure AI Search with integrated vectorization

Recommended approach: Start with 400-800 token chunks, structure-aware splitting, and 50-100 token overlap. Measure retrieval quality and adjust based on results.

Performance and Cost Tradeoffs

Factor	Keyword Only	Vector Only	Hybrid + Semantic
Quality (general queries)	Moderate	Good	Best
Quality (exact term queries)	Best	Poor	Good
Latency	Lowest (< 50ms)	Low (< 100ms)	Higher (100-500ms)
Index size	Smallest	Large (vectors)	Largest
Query cost	Lowest	Low	Higher (semantic ranker fee)
Implementation complexity	Low	Medium	High

For most enterprise RAG applications, the quality improvement from hybrid + semantic search justifies the additional cost and complexity. For high-volume, latency-sensitive applications (e.g., autocomplete, product search), keyword or vector-only may be more appropriate.

Next Steps

Search architecture is the foundation of every successful AI application. The difference between a knowledge assistant that delights users and one that frustrates them often comes down to retrieval quality — which documents are found, how they are ranked, and how much context the LLM receives.

Al Rafay Consulting designs and implements enterprise search architectures on Azure AI Search, from initial index design through production optimization. We help organizations build RAG applications that deliver accurate, grounded answers from their own content.