← back to blog

Designing a Scalable Search System for DoorDash

A staff-level deep dive into building distributed search infrastructure at scale

Chaïmae SritiOctober 2025

Building search systems at scale is one of the most challenging problems in distributed systems engineering. When users type "thai food near me" into DoorDash, they expect instant, relevant results—even when the system is handling millions of queries per second across hundreds of cities and millions of restaurants.

This case study walks through designing a production-grade search system from first principles. We'll cover the full stack: query understanding, indexing strategies, ranking algorithms, caching layers, and operational considerations. This is staff-level work—the kind of system design that requires understanding both the theoretical foundations and the practical constraints of running large-scale infrastructure.

The Challenge: Design a search system for DoorDash that can handle:

  • 10 million+ concurrent users searching for restaurants
  • 5+ million restaurants and dishes across 7,000+ cities globally
  • Sub-100ms p99 latency for search queries
  • Personalized results based on user preferences, location, time, and context
  • Real-time updates (restaurant opens/closes, menu changes, surge pricing)
  • Multi-language support and fuzzy matching for typos
  • High availability (99.99% uptime) with graceful degradation

1. Problem Space & Requirements

Before diving into solutions, let's decompose the problem systematically. Search systems have fundamentally different characteristics than ranking systems, and understanding these nuances is critical for staff-level design.

Functional Requirements

Query Types to Support

1. Keyword Search: "pizza", "sushi near me", "chinese takeout"
Challenge: Match against restaurant names, cuisines, and menu items
Example: "thai" should match both cuisine="Thai" and restaurant names containing "Thai"
Scale: 100M+ searchable tokens across all restaurants and menu items
2. Semantic Search: "healthy food", "romantic dinner", "quick lunch"
Challenge: Understand intent beyond exact keyword match
Example: "healthy food" → salads, grain bowls, smoothies (no exact match needed)
Why: 30-40% of queries are intent-based rather than entity-based
How: Requires embedding-based search (more on this later)
3. Geo-Aware Search: Location is first-class constraint, not just a ranking signal
Challenge: "pizza" in NYC should only return NYC restaurants
Why: Can't show restaurants 1000 miles away—hard constraint, not soft preference
Scale: Need to filter 5M restaurants to ~500 local candidates before ranking
4. Menu Item Search: "margherita pizza", "pad thai", "chicken tikka masala"
Challenge: Search at item level (50M+ dishes), return parent restaurant
Example: Search "garlic noodles" → return all restaurants with that dish
Scale: 50M menu items × 20 tokens/item = 1B searchable tokens
5. Autocomplete / Query Suggestions: As-you-type suggestions
Challenge: Must respond in <50ms (feels instant to user)
Example: "sush" → ["sushi", "sushi near me", "sushi burrito", ...]
Why: Reduces user effort, increases conversion by 30%+

Query Complexity Handling

Typo Tolerance: "pizzza" → "pizza"
Challenge: 15-20% of mobile queries have typos
Approach: Fuzzy matching (Levenshtein distance ≤ 2)
Tradeoff: Fuzzy search is 5-10x slower than exact match
Solution: Use fuzzy only as fallback when exact search returns < 10 results
Multi-term Queries: "spicy vegan thai restaurant"
Challenge: Combine multiple constraints (cuisine=Thai, dietary=vegan, preference=spicy)
Approach: Boolean query with AND/OR logic
Why: All terms must match (AND), but can match different fields
Example: "spicy" matches menu items, "thai" matches cuisine, "vegan" matches dietary_options
Synonyms & Variations: "italian" = "pizza" = "pasta"
Challenge: Users use different words for same concept
Example: "chinese" vs "asian" vs specific dishes like "dim sum"
How: Maintain synonym graph, expand query at search time
Tradeoff: Too aggressive expansion → irrelevant results; too conservative → missed results

Non-Functional Requirements

Latency Requirements:
• p50 latency: <50ms (median user experience)
• p99 latency: <100ms (worst-case acceptable)
• p999 latency: <200ms (even outliers must be reasonable)
Why: Every 100ms adds 1% bounce rate
Constraint: Must include network RTT (20-30ms), leaves 50-70ms for search logic
Availability Requirements:
• Target: 99.99% uptime (52 minutes downtime/year)
Why: Search is critical path—if search is down, users can't order
Graceful degradation: Prefer returning stale/cached results over errors
Failure handling: Must handle datacenter failures, network partitions, index corruption
Consistency Requirements:
Eventual consistency is acceptable (not strong consistency)
Example: Restaurant updates menu → okay if search reflects change in 5-10 seconds
Why: Strong consistency would require synchronous writes across all search replicas → too slow
Tradeoff: Async index updates = better write latency but users may briefly see stale data
Scalability Requirements:
Query load: 100K QPS average, 500K QPS peak (dinner time)
Index size: 5M restaurants × 10 menu items × 100 bytes = 5TB indexed data
Growth: 20% YoY restaurant growth → must handle 6M restaurants next year
Geographic distribution: Deploy search clusters in 12 regions for latency

Key Metrics to Optimize

1. Search-to-Order Conversion Rate (Primary Business Metric)
Definition: % of searches that lead to order within session
Baseline: ~12% (industry benchmark for food delivery)
Why this matters: Search exists to drive orders, not just return results
What moves it: Relevance, result quality, latency, restaurant availability
2. Zero-Result Rate (Quality Metric)
Definition: % of queries returning 0 results
Target: <5%
Why: Zero results = guaranteed no conversion, terrible UX
Causes: Typos, new restaurants not indexed, poor query understanding
Solution: Fuzzy matching, query relaxation, fallback to browse mode
3. Click-Through Rate (Relevance Proxy)
Definition: % of searches where user clicks on result
Target: >70%
Why: If users don't click, results aren't relevant
Note: High CTR doesn't guarantee orders—need to track both
4. Mean Reciprocal Rank (MRR) (Ranking Quality)
Definition: Average of 1/position of first clicked result
Example: User clicks result #1 → MRR=1.0, clicks #3 → MRR=0.33
Why: Measures how well we rank relevant results to the top
Target: MRR > 0.7 (first click within top 3 results)
5. Query Latency Percentiles (Performance)
p50: Median user experience
p99: Worst-case for most users
p999: Even outliers must be acceptable
Why track all three: Averages hide problems, need full distribution

Staff-Level Insight: The Latency-Relevance Tradeoff

The fundamental tension in search systems: more comprehensive search (fuzzy matching, semantic search, personalization) improves relevance but increases latency. At DoorDash scale, we can't afford to search all 5M restaurants per query. This drives the multi-stage architecture we'll design: fast, broad retrieval followed by expensive, precise ranking.

2. High-Level Architecture

A production search system is not a monolith—it's a composition of specialized components, each optimized for specific constraints. Let's start with the 30,000-foot view before diving into each component.

Search Request Flow (End-to-End)

User Query
"spicy thai near me"
1. Query Understanding
Parse → Normalize → Expand → Classify Intent
cuisine=thai, preference=spicy, geo=user_location
2. Stage 1: Candidate Retrieval (Fast & Broad)
Goal: 5M restaurants → 500 candidates in <20ms
• Geo filter: Within 5km radius (from 5M to 10K)
• Keyword filter: cuisine="thai" (from 10K to 800)
• Availability filter: is_open=true (from 800 to 500)
3. Stage 2: Semantic Matching (Medium Precision)
Goal: 500 candidates → 100 relevant in <30ms
• Embedding-based search for "spicy" intent
• Match against menu items containing spicy dishes
• Score = text_relevance × menu_match
4. Stage 3: Personalized Ranking (Slow & Precise)
Goal: 100 candidates → top 20 ranked in <40ms
• ML model: LightGBM with 80+ features
• User preferences (past orders, favorites, ratings)
• Contextual signals (time, weather, trending)
• Score = P(user will order | context)
5. Return Top 20 Results
Total latency: 90ms (p50), 120ms (p99)

Why Multi-Stage Architecture?

The Problem: We can't run an expensive ML model on 5 million restaurants per query
Math: 5M restaurants × 1ms per model inference = 5,000 seconds per query
Reality: We have 100ms total budget for the entire search request
Constraint: Must filter aggressively before applying expensive operations
The Solution: Funnel architecture—each stage removes candidates while preserving recall
Stage 1 (Retrieval): Simple filters, high recall (catch all relevant items), very fast
Stage 2 (Matching): Medium complexity, good precision, moderate speed
Stage 3 (Ranking): Complex ML, perfect precision, can be slow (small candidate set)
The Tradeoff: Each stage risks losing relevant results
Recall loss: If perfect restaurant is filtered in Stage 1, it can't be recovered later
Design principle: Early stages must be high-recall (catch all relevant), later stages high-precision (rank them correctly)
Monitoring: Track recall at each stage—if Stage 1 drops recall below 95%, fix it

Component Breakdown

Query Understanding Service
Role: Parse user query, extract structured filters, detect intent
Input: "spicy thai near me" (raw text)
Output: {cuisine: "thai", preference: "spicy", geo: user_location}
Technology: NLP models (BERT for intent classification), rule-based parsers
Latency: 10-15ms
Index Service (ElasticSearch/Solr)
Role: Store inverted index for fast keyword + geo lookup
Data structure: Inverted index (keyword → restaurant IDs) + geo-spatial index (R-tree)
Scale: 5M restaurants × 10KB metadata = 50GB index per region
Technology: ElasticSearch (industry standard for search), sharded by geography
Latency: 20-30ms for Stage 1 retrieval
Vector Search Service (Pinecone/Weaviate)
Role: Semantic matching using embeddings (for intent-based queries)
How: Encode query → find nearest neighbor restaurants in embedding space
Use case: "healthy dinner" → returns salad places even without keyword "salad"
Technology: HNSW (Hierarchical Navigable Small World) index for fast ANN search
Latency: 20-40ms for top-K retrieval from 500K restaurants
Ranking Service
Role: Score and rank final candidates using ML model
Model: LightGBM with 80+ features (user, restaurant, context, interaction)
Features: User preferences, restaurant quality, distance, time of day, trending, ...
Output: Ranked list of restaurants with confidence scores
Latency: 40-60ms for 100 candidates
Feature Store (Feast/Tecton)
Role: Serve precomputed features for ranking model
Why: Can't compute all features at query time (too slow)
Data: User preferences (updated daily), restaurant stats (updated hourly)
Technology: Redis for online serving, S3 for offline training
Latency: 1-3ms per feature lookup (cached in memory)
Caching Layer (Redis/CDN)
Role: Cache frequent queries to avoid recomputation
Cache hit rate: ~40% (many users search "pizza", "chinese", etc.)
TTL: 5-10 minutes (balance freshness vs hit rate)
Invalidation: Invalidate on restaurant updates (menu change, closed)
Impact: Cached queries serve in 5-10ms vs 100ms for full search

Key Architectural Principle: Separation of Concerns

Each component has a single responsibility: query understanding doesn't do ranking, indexing doesn't do personalization. This enables independent scaling, testing, and iteration. When ranking needs improvement, we update the ranking service without touching the index. This is staff-level design—building systems that are maintainable over years, not months.

3. Query Understanding & Intent Detection

Query understanding is the first—and often most underestimated—step in the search pipeline. Poor query parsing means downstream stages search for the wrong thing, no matter how good your indexing or ranking is. This is where many junior systems fail: they treat search as a simple keyword match when it's actually a natural language understanding problem.

Query Parsing Pipeline

Step 1: Tokenization & Normalization

Goal: Convert raw query string into clean, normalized tokens

Example transformations:
"Spicy Thai Near Me" → ["spicy", "thai", "near", "me"]
"PIZZA!!!" → ["pizza"]
"café" → ["cafe", "café"] (preserve accent + ASCII variant)
Normalization steps:
1. Lowercase: "Pizza" → "pizza" (case-insensitive matching)
2. Remove punctuation: "pizza!" → "pizza"
3. Strip extra whitespace: "pizza near" → "pizza near"
4. Handle diacritics: "café" → ["cafe", "café"] (search both variants)
5. Expand abbreviations: "bbq" → "barbecue" (using lookup table)
Why this matters:
• Without normalization: "Pizza" and "pizza" are different tokens → missed results
• With normalization: Both map to same token → consistent matching
Tradeoff: Over-normalization loses information (e.g., "NY" vs "ny")
Implementation:
// Tokenizer: ElasticSearch StandardAnalyzer + custom rules
{
"analyzer": "custom_analyzer",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "synonym_expansion"]
}

Step 2: Intent Classification

Goal: Determine what the user is trying to do (search types)

Intent Types:
1. Restaurant name: "Chipotle", "McDonald's" → Direct entity lookup
2. Cuisine type: "Thai", "Italian" → Filter by cuisine field
3. Dish/item: "Pad Thai", "Margherita pizza" → Search menu items
4. Descriptive: "healthy", "romantic" → Semantic/attribute search
5. Contextual: "near me", "open now" → Add filter constraints
Classification approach:
Option 1: Rule-Based (Fast, Simple)
• Exact match against known entity dictionary
• Example: "chipotle" in restaurant_names → intent=RESTAURANT_NAME
Pro: Zero latency, deterministic, easy to debug
Con: Doesn't generalize to new entities or ambiguous queries
Option 2: ML-Based (BERT Classifier)
• Train text classifier on labeled query data
• Input: Query embedding (768-dim from BERT)
• Output: Probability distribution over intent classes
Pro: Handles ambiguous/new queries, learns patterns
Con: 10-15ms inference latency, needs training data
Hybrid Approach (Best of Both):
• Try rule-based first (covers 70% of queries in <1ms)
• Fall back to ML for ambiguous cases
• Example: "Pad Thai" → rule-based recognizes as dish
• Example: "Something healthy" → ML classifies as DESCRIPTIVE
Example classification:
Query: "spicy thai near me"
→ Tokens: ["spicy", "thai", "near", "me"]
→ Intent: CUISINE (thai) + PREFERENCE (spicy) + LOCATION (near me)
→ Structured: {cuisine: "thai", preferences: ["spicy"], geo: user_location}

Step 3: Named Entity Recognition (NER)

Goal: Extract specific entities (restaurants, cuisines, dishes, locations)

Why NER matters:
• Query: "pad thai from Thai Basil restaurant"
• Without NER: All words searched equally → "thai" matches cuisine=Thai
• With NER: Recognize "Thai Basil" as restaurant name → search name field specifically
Result: Much more precise—only return "Thai Basil" restaurant, not all Thai places
Entity types to extract:
• RESTAURANT: "Chipotle", "Thai Basil", "Joe's Pizza"
• CUISINE: "Thai", "Italian", "Mexican"
• DISH: "Pad Thai", "Margherita pizza", "General Tso's chicken"
• LOCATION: "near me", "downtown", "Mission District"
• ATTRIBUTE: "spicy", "vegan", "gluten-free"
NER approaches:
Gazetteer (Dictionary Lookup):
• Maintain dictionary of known entities (5M restaurant names, 200 cuisines, 50K dishes)
• Check query tokens against dictionary using Trie data structure (fast prefix matching)
Pro: Very fast (1-2ms), high precision for known entities
Con: Misses new entities (newly opened restaurants)
ML-Based NER (BiLSTM-CRF or BERT-NER):
• Train sequence labeling model on annotated queries
• Example: "spicy [DISH:pad thai] from [RESTAURANT:Thai Basil]"
Pro: Handles unseen entities, understands context
Con: Slower (10-20ms), requires training data
Production approach:
• Use gazetteer for high-confidence matches (exact string match)
• Use ML-NER for ambiguous cases or when gazetteer fails
• Update gazetteer nightly with new restaurant names from database

Step 4: Query Expansion

Goal: Add synonyms and related terms to improve recall without hurting precision

Why query expansion:
• Problem: User searches "chinese" but restaurant is tagged "asian" → missed result
• Solution: Expand "chinese" → ["chinese", "asian", "dim sum", "noodles"]
Tradeoff: Too aggressive → irrelevant results; too conservative → low recall
Expansion strategies:
1. Synonym Mapping (Curated):
"chinese" → ["chinese", "asian", "cantonese", "szechuan"]
"pizza" → ["pizza", "italian", "pizzeria"]
"bbq" → ["bbq", "barbecue", "grilled"]
Pro: Full control, high precision
Con: Manual curation, doesn't scale to long tail
2. Query Log Mining (Data-Driven):
• Analyze historical queries: users who search "chinese" often search "dim sum"
• Build co-occurrence matrix: term A → related terms [B, C, D]
Pro: Automatically discovers relationships, scales
Con: Can amplify biases, needs large query volume
3. Embedding-Based Expansion:
• Compute query embedding, find nearest neighbors in embedding space
• Example: "healthy" → ["salad", "grain bowl", "smoothie"] (semantic similarity)
Pro: Captures semantic relationships, generalizes well
Con: Can be noisy, harder to control
When to apply expansion:
Aggressive expansion: When original query returns <10 results
Conservative expansion: When original query returns >50 results
No expansion: When query is specific entity name (e.g., "Chipotle")
Boost vs expand: OR expansion (more results) vs weighted boost (rerank existing)

Query Correction & Spelling

Handling Typos & Misspellings

The problem: 15-20% of mobile queries contain typos
• Example: "pizzza", "thia food", "chinse restaurant"
• Without correction: Zero results → terrible UX → user abandons
• With correction: "pizzza" → "pizza" → show relevant results
Correction approaches:
1. Edit Distance (Levenshtein):
• Compute edit distance between query and dictionary terms
• "pizzza" → "pizza" (distance=1, delete one 'z')
Threshold: Accept corrections with distance ≤ 2
Pro: Simple, handles simple typos well
Con: Expensive to compute for large dictionaries (5M restaurants)
Optimization: Use BK-tree or n-gram indexing for fast lookup
2. Phonetic Matching (Soundex/Metaphone):
• Match based on pronunciation, not spelling
• "thai" and "thigh" → same phonetic code → can be confusing
Use case: Handle phonetically similar misspellings
Tradeoff: Can introduce false positives
3. Context-Aware Spell Check (ML):
• Train language model on food domain text
• "thia food" → model predicts "thai food" (higher probability given context)
Model: Transformer-based (like BERT) trained on query logs
Pro: Handles complex corrections, understands context
Con: 10-20ms latency, requires training data
Production strategy:
1. First, try exact match (0ms)
2. If zero results, try fuzzy match with edit distance ≤ 2 (5-10ms)
3. If still zero results, apply ML spell correction (10-20ms)
4. Show corrected query to user: "Did you mean: thai food?"

Tradeoff: Latency vs Recall

• Fuzzy matching adds 10-20ms latency but recovers 15% of queries
• Decision: Worth it—users prefer 120ms with results over 100ms with nothing
• Optimization: Only apply fuzzy if exact search fails (avoid latency for common queries)

Staff-Level Insight: Query Understanding as ML Problem

Junior engineers treat query parsing as string manipulation. Staff engineers recognize it as a natural language understanding problem requiring ML. The best systems use hybrid approaches: fast rules for common cases, ML for long tail. The key is understanding when each approach is appropriate and measuring the impact on both latency and relevance.

4. Indexing Strategy & Data Structures

Indexing is where theory meets practice. You can have the best ranking algorithm in the world, but if your index can't retrieve candidates fast enough, your search is too slow. At scale, data structure choice matters more than algorithmic cleverness. This section covers the core indexes we need and why.

Primary Index: Inverted Index

What is an Inverted Index?

Concept: Map from term → list of documents containing that term

Forward index (slow): Restaurant → {name, cuisine, menu_items}
restaurant_1 → {name: "Thai Basil", cuisine: "thai", menu: ["pad thai", "curry"]}
restaurant_2 → {name: "Pizza Place", cuisine: "italian", menu: ["margherita", "pepperoni"]}
Search "thai": Must scan all 5M restaurants → O(N) → too slow!
Inverted index (fast): Term → [RestaurantIDs]
"thai" → [restaurant_1, restaurant_47, restaurant_892, ...]
"pizza" → [restaurant_2, restaurant_103, ...]
"pad thai" → [restaurant_1, restaurant_55, ...]
Search "thai": Hash lookup → O(1) → instant!
Implementation details:
Term dictionary: Map term → posting list (stored in memory for speed)
Posting lists: Sorted arrays of document IDs (enables efficient intersection)
Compression: Variable-byte encoding for posting lists (reduces memory 5-10×)
Skip lists: Speed up intersection of large posting lists

Multi-Field Indexing

Problem: Restaurants have multiple searchable fields
• Restaurant name: "Thai Basil"
• Cuisine: "thai"
• Menu items: ["pad thai", "green curry", "spring rolls"]
• Tags: ["spicy", "vegetarian-friendly", "delivery"]
Solution: Separate inverted index per field + field weights
name_index: "thai" → [restaurant_1] (exact name match)
• Weight: 5.0 (name matches are strongest signal)
cuisine_index: "thai" → [restaurant_1, restaurant_47, ...] (many Thai restaurants)
• Weight: 3.0 (cuisine match is important but less specific)
menu_index: "thai" → [restaurant_1, restaurant_55, ...] (restaurants with "thai" dishes)
• Weight: 2.0 (menu match is relevant but weakest signal)
Scoring: Combine matches across fields with weights
final_score = name_weight × name_match + cuisine_weight × cuisine_match + menu_weight × menu_match
• Restaurant with "thai" in name: 5.0 (highest priority)
• Restaurant with cuisine=thai: 3.0 (medium priority)
• Restaurant with "thai" menu item: 2.0 (lowest priority)

Why Field Weighting Matters

• Query: "thai"
• Without weights: "Pizza place with 1 thai dish" ranks equally with "Thai Basil"
• With weights: "Thai Basil" (name match, score=5.0) ranks above generic pizza place (menu match, score=2.0)

Geo-Spatial Index

Why Geo-Spatial Indexing is Critical

Location is a hard constraint in food delivery—can't deliver from 100 miles away. We need to filter millions of restaurants to hundreds within delivery radius before doing expensive operations.

The challenge:
• Query: "pizza near me"
• User location: (37.7749, -122.4194) [San Francisco]
• Naive approach: Compute distance to all 5M restaurants → 5M distance calculations → too slow!
Need: Data structure that prunes restaurants by location without checking all of them
Geo-spatial data structures:
Option 1: Geohash (Grid-Based)
How it works: Divide world into grid cells, assign each cell a hash code
Example: (37.7749, -122.4194) → geohash: "9q8yy"
Search: Compute user's geohash, retrieve all restaurants in same cell + neighboring cells
Pro: Simple, fast lookups (hash table), easy to understand
Con: Edge cases—cells are rectangular, not circular (distance within cell varies)
Use case: Good for approximate geo-filtering (within ~5km)
Option 2: R-Tree (Hierarchical Bounding Boxes)
How it works: Organize restaurants into nested bounding boxes (like B-tree for 2D space)
Search: Traverse tree, prune branches that don't intersect query radius
Pro: Exact distance queries, efficient for range searches
Con: More complex, harder to distribute across machines
Use case: PostGIS uses R-tree variants (standard for geo databases)
Option 3: S2 Geometry (Google's Approach)
How it works: Project Earth onto a sphere, divide into hierarchical cells
Advantage: Handles Earth's curvature (important for long distances), fast covering algorithm
Use case: Google Maps, Uber, Lyft use S2 for geo-spatial queries
Tradeoff: More complex than geohash, but more accurate
Production choice for DoorDash:
Primary: ElasticSearch geo_point + geo_distance query (uses R-tree internally)
Why: Integrates with text search, handles both geo + keyword filtering efficiently
Query example:
{
"query": {
"bool": {
"must": [{ "match": { "cuisine": "thai" } }],
"filter": [{ "geo_distance": { "distance": "5km", "location": [37.77, -122.41] } }]
}
}
}

Performance numbers:

Without geo-index: 5M distance calculations × 100ns = 500ms per query
With geo-index: Retrieve ~500 restaurants in 5km radius in 10-20ms
Speedup: 25-50×

Vector Index (Semantic Search)

Embedding-Based Search for Intent Queries

Inverted index works for keyword match. But what about queries like "healthy dinner" or "romantic date"? These don't match specific keywords—they express intent. This requires semantic search.

How semantic search works:
1. Encode: Convert query and restaurants into embedding vectors (dense 768-dim vectors)
2. Search: Find restaurants with embeddings nearest to query embedding (cosine similarity)
3. Rank: Return top-K nearest neighbors as results
Example:
Query: "healthy dinner"
→ Embedding: [0.2, -0.5, 0.8, ..., 0.3] (768 dimensions)
Restaurants with similar embeddings:
• "Sweetgreen" (salads, grain bowls) → cosine_similarity = 0.89
• "Protein Bar" (smoothies, wraps) → cosine_similarity = 0.85
• "Chipotle" (burritos) → cosine_similarity = 0.62 (less relevant)
The indexing challenge:
Naive: Compare query embedding to all 5M restaurant embeddings → 5M cosine similarity calculations
Math: 5M × 768 dims × 10 ops/dim = 38 billion operations per query → ~500ms
Problem: Too slow for production (need <100ms)
Solution: Approximate Nearest Neighbor (ANN) Search
HNSW (Hierarchical Navigable Small World)
How it works: Build graph where each node is a restaurant embedding, edges connect similar nodes
Search: Navigate graph via greedy routing—jump to nearest neighbor at each hop
Performance: Log(N) hops to find approximate nearest neighbors
Tradeoff: 95-99% recall (misses some true nearest neighbors) but 100× faster
Latency: 20-40ms to retrieve top-100 from 5M restaurants
Alternative: Product Quantization (PQ)
How it works: Compress embeddings from 768 floats to small codes (8-16 bytes)
Benefit: Fit entire index in memory (5M × 16 bytes = 80MB vs 5M × 768 × 4 bytes = 15GB)
Tradeoff: Lossy compression → slightly lower recall
Use case: Memory-constrained environments
Technology choices:
Pinecone / Weaviate / Milvus: Managed vector databases (HNSW + compression)
FAISS (Facebook): Open-source library for ANN search
ElasticSearch 8+: Native vector search support (convenient if already using ES)

When to use vector search:

Good for: Intent queries ("healthy", "romantic"), semantic similarity, cold-start (new items)
Not good for: Exact matches ("Chipotle"), structured filters (cuisine=Thai)
Best practice: Use inverted index for keyword queries, vector search for semantic queries
Hybrid: Combine both—keyword for initial filtering, embeddings for re-ranking

Index Sharding Strategy

Scaling Indexes Horizontally

5M restaurants × 10KB metadata = 50GB index. Too large for single machine. Need to shard(partition) index across multiple nodes.

Sharding strategies:
1. Geographic Sharding (Best for DoorDash)
Partition: Shard by city or region
Example: SF shard (50K restaurants), NYC shard (80K restaurants), ...
Query routing: User in SF → route to SF shard only
Pro: Queries only search local shard (fast), natural data locality
Con: Imbalanced shards (NYC >> small town)
Solution: Dynamic sharding—split large shards, merge small shards
2. Hash-Based Sharding
Partition: hash(restaurant_id) % num_shards → shard_id
Pro: Perfectly balanced shards
Con: Must query all shards (can't route by geography) → higher latency
Use case: When queries aren't geo-localized (e.g., global search)
3. Hybrid: Geo + Hash
Partition: First by region (10 regions), then hash within region
Example: SF region has 3 shards (hash-distributed for balance)
Pro: Combines geo-locality + load balancing
Con: More complex routing logic
Replication for availability:
Strategy: 3 replicas per shard (1 primary, 2 replicas)
Reads: Distribute across replicas (load balancing)
Writes: Primary handles writes, async replication to replicas
Failover: If primary dies, promote replica to primary

Staff-level consideration:

• Sharding is not just about capacity—it's about isolating failures
• If one shard goes down, only queries to that region fail (not entire system)
• Geographic sharding provides natural failure isolation (NYC shard failure doesn't affect SF)

Key Takeaway: Index Choice Drives System Design

Staff-level engineers understand that index choice is not a backend implementation detail—it fundamentally shapes the system's capabilities, latency profile, and failure modes. Inverted index enables keyword search. Geo-spatial index enables location filtering. Vector index enables semantic search. Each has different performance characteristics and scaling requirements. The art is knowing when to use each and how to combine them.

5. Multi-Stage Retrieval Pipeline

Now that we have our indexes built, let's discuss how to actually retrieve restaurants efficiently. The core insight: retrieval is a funnel—start broad, progressively narrow. Each stage trades recall for precision and speed.

Stage 1: Candidate Generation (Fast & Broad)

Goal: 5M → 500 candidates in <20ms

Objective: Quickly filter to restaurants that could be relevant. High recall is critical—if we miss the perfect restaurant here, it's gone forever.

Filtering strategy:
1. Geographic filter: Within 5km radius (hard constraint)
• Query: geo_distance filter on ElasticSearch
• Result: 5M → ~10,000 restaurants (depends on city density)
• Latency: 5-10ms
2. Availability filter: Currently open and accepting orders
• Filter: is_open=true AND is_accepting_orders=true
• Result: 10,000 → ~6,000 restaurants (40% closed during off-hours)
• Latency: Negligible (indexed boolean field)
3. Keyword match: cuisine, name, or menu items match query
• Query: "thai" → match against cuisine, name, menu_items fields
• Result: 6,000 → ~500 restaurants
• Latency: 5-10ms (inverted index lookup)
ElasticSearch query structure:
{
"query": {
"bool": {
"must": [
{ "multi_match": { "query": "thai", "fields": ["name^5", "cuisine^3", "menu_items^2"] } }
],
"filter": [
{ "geo_distance": { "distance": "5km", "location": [37.77, -122.41] } },
{ "term": { "is_open": true } },
{ "term": { "is_accepting_orders": true } }
]
}
},
"size": 500
}

Why 500 candidates?

• Too few (e.g., 50): Risk missing relevant restaurants → low recall
• Too many (e.g., 2000): Next stages become slow → high latency
• 500 is sweet spot: High recall (95%+) but manageable for Stage 2
Tuning: Monitor recall at Stage 1—if <95%, increase candidate count

Stage 2: Semantic Matching (Medium Precision)

Goal: 500 → 100 relevant candidates in <30ms

Objective: Apply more expensive semantic matching to narrow to truly relevant restaurants.

When semantic matching helps:
• Query: "spicy thai" (not just cuisine, but preference)
• Stage 1 returned all Thai restaurants (500)
• Stage 2 narrows to Thai restaurants with spicy dishes (100)
Approach: Menu-level matching
1. Extract query intent: "spicy" is a preference, not cuisine
2. Search menu items: Check if restaurant has menu items matching "spicy"
3. Score: Count matches + TF-IDF weight
• Restaurant A: 10 spicy dishes → score = 10 × tf_idf("spicy") = 2.5
• Restaurant B: 1 spicy dish → score = 1 × tf_idf("spicy") = 0.25
• Restaurant A ranks higher
Alternative: Embedding-based matching (for vague queries)
• Query: "healthy dinner" (no specific keywords)
• Compute query embedding: embed("healthy dinner") → 768-dim vector
• Retrieve restaurants with similar embeddings from vector DB
Result: Salad places, grain bowl restaurants rank high (semantic similarity)
Hybrid approach (production):
1. For keyword queries ("thai", "pizza"): Use inverted index (fast)
2. For intent queries ("healthy", "romantic"): Use embedding search (slower but necessary)
3. For mixed queries ("spicy thai"): Combine both—keyword filter + semantic rerank

Latency breakdown:

• Menu text search (keyword): 10-15ms for 500 candidates
• Embedding similarity (ANN): 20-30ms for 500 candidates
Optimization: Only apply embedding search if keyword search is ambiguous

Stage 3: Personalized Ranking (Slow & Precise)

Goal: 100 → top 20 ranked in <40ms

Objective: Apply expensive ML model to rank remaining candidates with high precision. This is where personalization happens—same query, different users, different rankings.

Ranking model: LightGBM (Gradient Boosted Trees)
Input: 80+ features per (user, restaurant, context) tuple
Output: P(user will order | restaurant shown)
Model size: 300 trees × 6 depth = ~5MB (fits in memory)
Inference: 0.4ms per restaurant × 100 restaurants = 40ms
Feature categories:
1. User features (20):
• Historical: favorite_cuisine, avg_order_value, orders_last_30d
• Preferences: price_sensitivity, avg_rating_given, dietary_restrictions
• Behavior: time_since_last_order, typical_order_time
2. Restaurant features (25):
• Quality: rating, num_reviews, review_velocity
• Operational: avg_prep_time, acceptance_rate, cancellation_rate
• Popularity: orders_last_hour, orders_last_day, trending_score
3. Interaction features (15):
• Geography: distance_km, is_in_preferred_area
• Affinity: cuisine_match, price_match, has_favorite_items
• History: has_ordered_before, last_order_date, avg_order_value_here
4. Context features (20):
• Time: hour_of_day, day_of_week, is_meal_time
• Weather: is_raining, temperature
• Events: is_holiday, local_events_nearby, traffic_level
Feature computation:
Precomputed (from feature store): User preferences, restaurant stats (99% of features)
Real-time (computed per request): Distance, time context, current weather (1% of features)
Why this split: Precomputing is expensive—do once daily, cache in Redis

Why LightGBM over neural networks?

Speed: 40ms for 100 restaurants vs 150ms for deep neural net
Interpretability: Can inspect feature importance, debug easily
Tabular data: GBDTs excel at structured features (user features, restaurant metadata)
Tradeoff: Neural nets can learn complex interactions but not worth latency cost
When to use neural nets: If you need to process raw images/text (e.g., menu photos)

Putting It All Together: Request Flow

Complete retrieval pipeline timing

1. Query understanding (parse + NER + expansion)10-15ms
2. Stage 1: Candidate generation (geo + keyword filter)15-20ms
3. Stage 2: Semantic matching (menu search / embeddings)20-30ms
4. Feature lookup (Redis batch MGET for 100 restaurants)5ms
5. Stage 3: ML ranking (LightGBM inference)40ms
Total latency (p50):90-110ms ✓
Total latency (p99, with cache misses):150-180ms ✓

Staff-Level Insight: The Funnel is Non-Negotiable

Junior engineers try to optimize by running one expensive model on all 5M restaurants. Staff engineers understand that's mathematically impossible at scale. The funnel architecture—cheap filters first, expensive ranking last—is the only way to hit sub-100ms latency. The art is tuning each stage: too aggressive filtering → low recall; too lenient → high latency. Monitor recall at each stage and iterate.

6. Ranking & Relevance Scoring

Ranking is where machine learning meets business logic. The goal: order results so the most relevant restaurants appear first. But "relevance" is multi-dimensional—it's not just text match, it's also quality, distance, personalization, and business objectives (promotions, diversity).

7. Personalization Layer

Personalization is the difference between generic search (same results for everyone) and great search (tailored to individual preferences). This section covers how to build and serve personalized features without sacrificing latency.

Final Thoughts

Building search systems at DoorDash scale requires understanding the full stack—from query parsing to indexing to ranking to infrastructure. The key principles: start with clear requirements, design for latency from day one, use multi-stage architectures, and always measure impact on business metrics. Great search systems aren't built overnight—they're iterated over months, informed by real user behavior and rigorous A/B testing.

This is staff-level work: designing systems that scale, anticipating failure modes, making informed tradeoffs, and building for maintainability over years. Master these principles and you'll be able to design search systems for any domain—not just food delivery.