Designing a Scalable Search System for DoorDash

/home/blog/ml/notes
← back to blogDesigning a Scalable Search System for DoorDashA staff-level deep dive into building distributed search infrastructure at scale
Chaïmae Sriti•October 2025
Building search systems at scale is one of the most challenging problems in distributed systems engineering. When users type "thai food near me" into DoorDash, they expect instant, relevant results—even when the system is handling millions of queries per second across hundreds of cities and millions of restaurants.
This case study walks through designing a production-grade search system from first principles. We'll cover the full stack: query understanding, indexing strategies, ranking algorithms, caching layers, and operational considerations. This is staff-level work—the kind of system design that requires understanding both the theoretical foundations and the practical constraints of running large-scale infrastructure.
Contents1. Problem Space & Requirements
2. High-Level Architecture
3. Query Understanding & Intent Detection
4. Indexing Strategy & Data Structures
5. Multi-Stage Retrieval Pipeline
6. Ranking & Relevance Scoring
7. Personalization Layer
8. Caching & Performance Optimization
9. Geo-Spatial Search Optimization
10. Scaling to Millions of QPS
11. Monitoring & Quality Metrics
12. System Design Tradeoffs
13. Failure Modes & Resilience
14. End-to-End Case Study
The Challenge: Design a search system for DoorDash that can handle:
10 million+ concurrent users searching for restaurants
5+ million restaurants and dishes across 7,000+ cities globally
Sub-100ms p99 latency for search queries
Personalized results based on user preferences, location, time, and context
Real-time updates (restaurant opens/closes, menu changes, surge pricing)
Multi-language support and fuzzy matching for typos
High availability (99.99% uptime) with graceful degradation
1. Problem Space & RequirementsBefore diving into solutions, let's decompose the problem systematically. Search systems have fundamentally different characteristics than ranking systems, and understanding these nuances is critical for staff-level design.
Functional RequirementsQuery Types to Support
1. Keyword Search: "pizza", "sushi near me", "chinese takeout"• Challenge: Match against restaurant names, cuisines, and menu items
• Example: "thai" should match both cuisine="Thai" and restaurant names containing "Thai"
• Scale: 100M+ searchable tokens across all restaurants and menu items
2. Semantic Search: "healthy food", "romantic dinner", "quick lunch"• Challenge: Understand intent beyond exact keyword match
• Example: "healthy food" → salads, grain bowls, smoothies (no exact match needed)
• Why: 30-40% of queries are intent-based rather than entity-based
• How: Requires embedding-based search (more on this later)
3. Geo-Aware Search: Location is first-class constraint, not just a ranking signal• Challenge: "pizza" in NYC should only return NYC restaurants
• Why: Can't show restaurants 1000 miles away—hard constraint, not soft preference
• Scale: Need to filter 5M restaurants to ~500 local candidates before ranking
4. Menu Item Search: "margherita pizza", "pad thai", "chicken tikka masala"• Challenge: Search at item level (50M+ dishes), return parent restaurant
• Example: Search "garlic noodles" → return all restaurants with that dish
• Scale: 50M menu items × 20 tokens/item = 1B searchable tokens
5. Autocomplete / Query Suggestions: As-you-type suggestions• Challenge: Must respond in <50ms (feels instant to user)
• Example: "sush" → ["sushi", "sushi near me", "sushi burrito", ...]
• Why: Reduces user effort, increases conversion by 30%+
Query Complexity Handling
Typo Tolerance: "pizzza" → "pizza"• Challenge: 15-20% of mobile queries have typos
• Approach: Fuzzy matching (Levenshtein distance ≤ 2)
• Tradeoff: Fuzzy search is 5-10x slower than exact match
• Solution: Use fuzzy only as fallback when exact search returns < 10 results
Multi-term Queries: "spicy vegan thai restaurant"• Challenge: Combine multiple constraints (cuisine=Thai, dietary=vegan, preference=spicy)
• Approach: Boolean query with AND/OR logic
• Why: All terms must match (AND), but can match different fields
• Example: "spicy" matches menu items, "thai" matches cuisine, "vegan" matches dietary_options
Synonyms & Variations: "italian" = "pizza" = "pasta"• Challenge: Users use different words for same concept
• Example: "chinese" vs "asian" vs specific dishes like "dim sum"
• How: Maintain synonym graph, expand query at search time
• Tradeoff: Too aggressive expansion → irrelevant results; too conservative → missed results
Non-Functional RequirementsLatency Requirements:• p50 latency: <50ms (median user experience)
• p99 latency: <100ms (worst-case acceptable)
• p999 latency: <200ms (even outliers must be reasonable)
• Why: Every 100ms adds 1% bounce rate
• Constraint: Must include network RTT (20-30ms), leaves 50-70ms for search logic
Availability Requirements:• Target: 99.99% uptime (52 minutes downtime/year)
• Why: Search is critical path—if search is down, users can't order
• Graceful degradation: Prefer returning stale/cached results over errors
• Failure handling: Must handle datacenter failures, network partitions, index corruption
Consistency Requirements:• Eventual consistency is acceptable (not strong consistency)
• Example: Restaurant updates menu → okay if search reflects change in 5-10 seconds
• Why: Strong consistency would require synchronous writes across all search replicas → too slow
• Tradeoff: Async index updates = better write latency but users may briefly see stale data
Scalability Requirements:• Query load: 100K QPS average, 500K QPS peak (dinner time)
• Index size: 5M restaurants × 10 menu items × 100 bytes = 5TB indexed data
• Growth: 20% YoY restaurant growth → must handle 6M restaurants next year
• Geographic distribution: Deploy search clusters in 12 regions for latency
Key Metrics to Optimize1. Search-to-Order Conversion Rate (Primary Business Metric)• Definition: % of searches that lead to order within session
• Baseline: ~12% (industry benchmark for food delivery)
• Why this matters: Search exists to drive orders, not just return results
• What moves it: Relevance, result quality, latency, restaurant availability
2. Zero-Result Rate (Quality Metric)• Definition: % of queries returning 0 results
• Target: <5%
• Why: Zero results = guaranteed no conversion, terrible UX
• Causes: Typos, new restaurants not indexed, poor query understanding
• Solution: Fuzzy matching, query relaxation, fallback to browse mode
3. Click-Through Rate (Relevance Proxy)• Definition: % of searches where user clicks on result
• Target: >70%
• Why: If users don't click, results aren't relevant
• Note: High CTR doesn't guarantee orders—need to track both
4. Mean Reciprocal Rank (MRR) (Ranking Quality)• Definition: Average of 1/position of first clicked result
• Example: User clicks result #1 → MRR=1.0, clicks #3 → MRR=0.33
• Why: Measures how well we rank relevant results to the top
• Target: MRR > 0.7 (first click within top 3 results)
5. Query Latency Percentiles (Performance)• p50: Median user experience
• p99: Worst-case for most users
• p999: Even outliers must be acceptable
• Why track all three: Averages hide problems, need full distribution
Staff-Level Insight: The Latency-Relevance Tradeoff
The fundamental tension in search systems: more comprehensive search (fuzzy matching, semantic search, personalization) improves relevance but increases latency. At DoorDash scale, we can't afford to search all 5M restaurants per query. This drives the multi-stage architecture we'll design: fast, broad retrieval followed by expensive, precise ranking.
2. High-Level ArchitectureA production search system is not a monolith—it's a composition of specialized components, each optimized for specific constraints. Let's start with the 30,000-foot view before diving into each component.
Search Request Flow (End-to-End)
User Query
"spicy thai near me"
↓
1. Query Understanding
Parse → Normalize → Expand → Classify Intent
cuisine=thai, preference=spicy, geo=user_location
↓
2. Stage 1: Candidate Retrieval (Fast & Broad)
Goal: 5M restaurants → 500 candidates in <20ms
• Geo filter: Within 5km radius (from 5M to 10K)
• Keyword filter: cuisine="thai" (from 10K to 800)
• Availability filter: is_open=true (from 800 to 500)
↓
3. Stage 2: Semantic Matching (Medium Precision)
Goal: 500 candidates → 100 relevant in <30ms
• Embedding-based search for "spicy" intent
• Match against menu items containing spicy dishes
• Score = text_relevance × menu_match
↓
4. Stage 3: Personalized Ranking (Slow & Precise)
Goal: 100 candidates → top 20 ranked in <40ms
• ML model: LightGBM with 80+ features
• User preferences (past orders, favorites, ratings)
• Contextual signals (time, weather, trending)
• Score = P(user will order | context)
↓
5. Return Top 20 Results
Total latency: 90ms (p50), 120ms (p99)
Why Multi-Stage Architecture?The Problem: We can't run an expensive ML model on 5 million restaurants per query• Math: 5M restaurants × 1ms per model inference = 5,000 seconds per query
• Reality: We have 100ms total budget for the entire search request
• Constraint: Must filter aggressively before applying expensive operations
The Solution: Funnel architecture—each stage removes candidates while preserving recall• Stage 1 (Retrieval): Simple filters, high recall (catch all relevant items), very fast
• Stage 2 (Matching): Medium complexity, good precision, moderate speed
• Stage 3 (Ranking): Complex ML, perfect precision, can be slow (small candidate set)
The Tradeoff: Each stage risks losing relevant results• Recall loss: If perfect restaurant is filtered in Stage 1, it can't be recovered later
• Design principle: Early stages must be high-recall (catch all relevant), later stages high-precision (rank them correctly)
• Monitoring: Track recall at each stage—if Stage 1 drops recall below 95%, fix it
Component BreakdownQuery Understanding Service• Role: Parse user query, extract structured filters, detect intent
• Input: "spicy thai near me" (raw text)
• Output: {cuisine: "thai", preference: "spicy", geo: user_location}
• Technology: NLP models (BERT for intent classification), rule-based parsers
• Latency: 10-15ms
Index Service (ElasticSearch/Solr)• Role: Store inverted index for fast keyword + geo lookup
• Data structure: Inverted index (keyword → restaurant IDs) + geo-spatial index (R-tree)
• Scale: 5M restaurants × 10KB metadata = 50GB index per region
• Technology: ElasticSearch (industry standard for search), sharded by geography
• Latency: 20-30ms for Stage 1 retrieval
Vector Search Service (Pinecone/Weaviate)• Role: Semantic matching using embeddings (for intent-based queries)
• How: Encode query → find nearest neighbor restaurants in embedding space
• Use case: "healthy dinner" → returns salad places even without keyword "salad"
• Technology: HNSW (Hierarchical Navigable Small World) index for fast ANN search
• Latency: 20-40ms for top-K retrieval from 500K restaurants
Ranking Service• Role: Score and rank final candidates using ML model
• Model: LightGBM with 80+ features (user, restaurant, context, interaction)
• Features: User preferences, restaurant quality, distance, time of day, trending, ...
• Output: Ranked list of restaurants with confidence scores
• Latency: 40-60ms for 100 candidates
Feature Store (Feast/Tecton)• Role: Serve precomputed features for ranking model
• Why: Can't compute all features at query time (too slow)
• Data: User preferences (updated daily), restaurant stats (updated hourly)
• Technology: Redis for online serving, S3 for offline training
• Latency: 1-3ms per feature lookup (cached in memory)
Caching Layer (Redis/CDN)• Role: Cache frequent queries to avoid recomputation
• Cache hit rate: ~40% (many users search "pizza", "chinese", etc.)
• TTL: 5-10 minutes (balance freshness vs hit rate)
• Invalidation: Invalidate on restaurant updates (menu change, closed)
• Impact: Cached queries serve in 5-10ms vs 100ms for full search
Key Architectural Principle: Separation of Concerns
Each component has a single responsibility: query understanding doesn't do ranking, indexing doesn't do personalization. This enables independent scaling, testing, and iteration. When ranking needs improvement, we update the ranking service without touching the index. This is staff-level design—building systems that are maintainable over years, not months.
3. Query Understanding & Intent DetectionQuery understanding is the first—and often most underestimated—step in the search pipeline. Poor query parsing means downstream stages search for the wrong thing, no matter how good your indexing or ranking is. This is where many junior systems fail: they treat search as a simple keyword match when it's actually a natural language understanding problem.
Query Parsing PipelineStep 1: Tokenization & Normalization
Goal: Convert raw query string into clean, normalized tokens
Example transformations:"Spicy  Thai    Near Me" → ["spicy", "thai", "near", "me"]
"PIZZA!!!" → ["pizza"]
"café" → ["cafe", "café"] (preserve accent + ASCII variant)
Normalization steps:1. Lowercase: "Pizza" → "pizza" (case-insensitive matching)
2. Remove punctuation: "pizza!" → "pizza"
3. Strip extra whitespace: "pizza  near" → "pizza near"
4. Handle diacritics: "café" → ["cafe", "café"] (search both variants)
5. Expand abbreviations: "bbq" → "barbecue" (using lookup table)
Why this matters:• Without normalization: "Pizza" and "pizza" are different tokens → missed results
• With normalization: Both map to same token → consistent matching
• Tradeoff: Over-normalization loses information (e.g., "NY" vs "ny")
Implementation:// Tokenizer: ElasticSearch StandardAnalyzer + custom rules
{
  "analyzer": "custom_analyzer",
  "tokenizer": "standard",
  "filter": ["lowercase", "asciifolding", "synonym_expansion"]
}
Step 2: Intent Classification
Goal: Determine what the user is trying to do (search types)
Intent Types:1. Restaurant name: "Chipotle", "McDonald's" → Direct entity lookup
2. Cuisine type: "Thai", "Italian" → Filter by cuisine field
3. Dish/item: "Pad Thai", "Margherita pizza" → Search menu items
4. Descriptive: "healthy", "romantic" → Semantic/attribute search
5. Contextual: "near me", "open now" → Add filter constraints
Classification approach:Option 1: Rule-Based (Fast, Simple)• Exact match against known entity dictionary
• Example: "chipotle" in restaurant_names → intent=RESTAURANT_NAME
• Pro: Zero latency, deterministic, easy to debug
• Con: Doesn't generalize to new entities or ambiguous queries
Option 2: ML-Based (BERT Classifier)• Train text classifier on labeled query data
• Input: Query embedding (768-dim from BERT)
• Output: Probability distribution over intent classes
• Pro: Handles ambiguous/new queries, learns patterns
• Con: 10-15ms inference latency, needs training data
Hybrid Approach (Best of Both):• Try rule-based first (covers 70% of queries in <1ms)
• Fall back to ML for ambiguous cases
• Example: "Pad Thai" → rule-based recognizes as dish
• Example: "Something healthy" → ML classifies as DESCRIPTIVE
Example classification:Query: "spicy thai near me"
→ Tokens: ["spicy", "thai", "near", "me"]
→ Intent: CUISINE (thai) + PREFERENCE (spicy) + LOCATION (near me)
→ Structured: {cuisine: "thai", preferences: ["spicy"], geo: user_location}
Step 3: Named Entity Recognition (NER)
Goal: Extract specific entities (restaurants, cuisines, dishes, locations)
Why NER matters:• Query: "pad thai from Thai Basil restaurant"
• Without NER: All words searched equally → "thai" matches cuisine=Thai
• With NER: Recognize "Thai Basil" as restaurant name → search name field specifically
• Result: Much more precise—only return "Thai Basil" restaurant, not all Thai places
Entity types to extract:• RESTAURANT: "Chipotle", "Thai Basil", "Joe's Pizza"
• CUISINE: "Thai", "Italian", "Mexican"
• DISH: "Pad Thai", "Margherita pizza", "General Tso's chicken"
• LOCATION: "near me", "downtown", "Mission District"
• ATTRIBUTE: "spicy", "vegan", "gluten-free"
NER approaches:Gazetteer (Dictionary Lookup):• Maintain dictionary of known entities (5M restaurant names, 200 cuisines, 50K dishes)
• Check query tokens against dictionary using Trie data structure (fast prefix matching)
• Pro: Very fast (1-2ms), high precision for known entities
• Con: Misses new entities (newly opened restaurants)
ML-Based NER (BiLSTM-CRF or BERT-NER):• Train sequence labeling model on annotated queries
• Example: "spicy [DISH:pad thai] from [RESTAURANT:Thai Basil]"
• Pro: Handles unseen entities, understands context
• Con: Slower (10-20ms), requires training data
Production approach:• Use gazetteer for high-confidence matches (exact string match)
• Use ML-NER for ambiguous cases or when gazetteer fails
• Update gazetteer nightly with new restaurant names from database
Step 4: Query Expansion
Goal: Add synonyms and related terms to improve recall without hurting precision
Why query expansion:• Problem: User searches "chinese" but restaurant is tagged "asian" → missed result
• Solution: Expand "chinese" → ["chinese", "asian", "dim sum", "noodles"]
• Tradeoff: Too aggressive → irrelevant results; too conservative → low recall
Expansion strategies:1. Synonym Mapping (Curated):"chinese" → ["chinese", "asian", "cantonese", "szechuan"]
"pizza" → ["pizza", "italian", "pizzeria"]
"bbq" → ["bbq", "barbecue", "grilled"]
• Pro: Full control, high precision
• Con: Manual curation, doesn't scale to long tail
2. Query Log Mining (Data-Driven):• Analyze historical queries: users who search "chinese" often search "dim sum"
• Build co-occurrence matrix: term A → related terms [B, C, D]
• Pro: Automatically discovers relationships, scales
• Con: Can amplify biases, needs large query volume
3. Embedding-Based Expansion:• Compute query embedding, find nearest neighbors in embedding space
• Example: "healthy" → ["salad", "grain bowl", "smoothie"] (semantic similarity)
• Pro: Captures semantic relationships, generalizes well
• Con: Can be noisy, harder to control
When to apply expansion:• Aggressive expansion: When original query returns <10 results
• Conservative expansion: When original query returns >50 results
• No expansion: When query is specific entity name (e.g., "Chipotle")
• Boost vs expand: OR expansion (more results) vs weighted boost (rerank existing)
Query Correction & SpellingHandling Typos & Misspellings
The problem: 15-20% of mobile queries contain typos• Example: "pizzza", "thia food", "chinse restaurant"
• Without correction: Zero results → terrible UX → user abandons
• With correction: "pizzza" → "pizza" → show relevant results
Correction approaches:1. Edit Distance (Levenshtein):• Compute edit distance between query and dictionary terms
• "pizzza" → "pizza" (distance=1, delete one 'z')
• Threshold: Accept corrections with distance ≤ 2
• Pro: Simple, handles simple typos well
• Con: Expensive to compute for large dictionaries (5M restaurants)
• Optimization: Use BK-tree or n-gram indexing for fast lookup
2. Phonetic Matching (Soundex/Metaphone):• Match based on pronunciation, not spelling
• "thai" and "thigh" → same phonetic code → can be confusing
• Use case: Handle phonetically similar misspellings
• Tradeoff: Can introduce false positives
3. Context-Aware Spell Check (ML):• Train language model on food domain text
• "thia food" → model predicts "thai food" (higher probability given context)
• Model: Transformer-based (like BERT) trained on query logs
• Pro: Handles complex corrections, understands context
• Con: 10-20ms latency, requires training data
Production strategy:1. First, try exact match (0ms)
2. If zero results, try fuzzy match with edit distance ≤ 2 (5-10ms)
3. If still zero results, apply ML spell correction (10-20ms)
4. Show corrected query to user: "Did you mean: thai food?"
Tradeoff: Latency vs Recall
• Fuzzy matching adds 10-20ms latency but recovers 15% of queries
• Decision: Worth it—users prefer 120ms with results over 100ms with nothing
• Optimization: Only apply fuzzy if exact search fails (avoid latency for common queries)
Staff-Level Insight: Query Understanding as ML Problem
Junior engineers treat query parsing as string manipulation. Staff engineers recognize it as a natural language understanding problem requiring ML. The best systems use hybrid approaches: fast rules for common cases, ML for long tail. The key is understanding when each approach is appropriate and measuring the impact on both latency and relevance.
4. Indexing Strategy & Data StructuresIndexing is where theory meets practice. You can have the best ranking algorithm in the world, but if your index can't retrieve candidates fast enough, your search is too slow. At scale, data structure choice matters more than algorithmic cleverness. This section covers the core indexes we need and why.
Primary Index: Inverted IndexWhat is an Inverted Index?
Concept: Map from term → list of documents containing that term
Forward index (slow): Restaurant → {name, cuisine, menu_items}restaurant_1 → {name: "Thai Basil", cuisine: "thai", menu: ["pad thai", "curry"]}
restaurant_2 → {name: "Pizza Place", cuisine: "italian", menu: ["margherita", "pepperoni"]}
• Search "thai": Must scan all 5M restaurants → O(N) → too slow!
Inverted index (fast): Term → [RestaurantIDs]"thai" → [restaurant_1, restaurant_47, restaurant_892, ...]
"pizza" → [restaurant_2, restaurant_103, ...]
"pad thai" → [restaurant_1, restaurant_55, ...]
• Search "thai": Hash lookup → O(1) → instant!
Implementation details:• Term dictionary: Map term → posting list (stored in memory for speed)
• Posting lists: Sorted arrays of document IDs (enables efficient intersection)
• Compression: Variable-byte encoding for posting lists (reduces memory 5-10×)
• Skip lists: Speed up intersection of large posting lists
Multi-Field Indexing
Problem: Restaurants have multiple searchable fields• Restaurant name: "Thai Basil"
• Cuisine: "thai"
• Menu items: ["pad thai", "green curry", "spring rolls"]
• Tags: ["spicy", "vegetarian-friendly", "delivery"]
Solution: Separate inverted index per field + field weightsname_index: "thai" → [restaurant_1] (exact name match)• Weight: 5.0 (name matches are strongest signal)
cuisine_index: "thai" → [restaurant_1, restaurant_47, ...] (many Thai restaurants)• Weight: 3.0 (cuisine match is important but less specific)
menu_index: "thai" → [restaurant_1, restaurant_55, ...] (restaurants with "thai" dishes)• Weight: 2.0 (menu match is relevant but weakest signal)
Scoring: Combine matches across fields with weightsfinal_score = name_weight × name_match + cuisine_weight × cuisine_match + menu_weight × menu_match
• Restaurant with "thai" in name: 5.0 (highest priority)
• Restaurant with cuisine=thai: 3.0 (medium priority)
• Restaurant with "thai" menu item: 2.0 (lowest priority)
Why Field Weighting Matters
• Query: "thai"
• Without weights: "Pizza place with 1 thai dish" ranks equally with "Thai Basil"
• With weights: "Thai Basil" (name match, score=5.0) ranks above generic pizza place (menu match, score=2.0)
Geo-Spatial IndexWhy Geo-Spatial Indexing is Critical
Location is a hard constraint in food delivery—can't deliver from 100 miles away. We need to filter millions of restaurants to hundreds within delivery radius before doing expensive operations.
The challenge:• Query: "pizza near me"
• User location: (37.7749, -122.4194) [San Francisco]
• Naive approach: Compute distance to all 5M restaurants → 5M distance calculations → too slow!
• Need: Data structure that prunes restaurants by location without checking all of them
Geo-spatial data structures:Option 1: Geohash (Grid-Based)• How it works: Divide world into grid cells, assign each cell a hash code
• Example: (37.7749, -122.4194) → geohash: "9q8yy"
• Search: Compute user's geohash, retrieve all restaurants in same cell + neighboring cells
• Pro: Simple, fast lookups (hash table), easy to understand
• Con: Edge cases—cells are rectangular, not circular (distance within cell varies)
• Use case: Good for approximate geo-filtering (within ~5km)
Option 2: R-Tree (Hierarchical Bounding Boxes)• How it works: Organize restaurants into nested bounding boxes (like B-tree for 2D space)
• Search: Traverse tree, prune branches that don't intersect query radius
• Pro: Exact distance queries, efficient for range searches
• Con: More complex, harder to distribute across machines
• Use case: PostGIS uses R-tree variants (standard for geo databases)
Option 3: S2 Geometry (Google's Approach)• How it works: Project Earth onto a sphere, divide into hierarchical cells
• Advantage: Handles Earth's curvature (important for long distances), fast covering algorithm
• Use case: Google Maps, Uber, Lyft use S2 for geo-spatial queries
• Tradeoff: More complex than geohash, but more accurate
Production choice for DoorDash:• Primary: ElasticSearch geo_point + geo_distance query (uses R-tree internally)
• Why: Integrates with text search, handles both geo + keyword filtering efficiently
• Query example:
{
  "query": {
    "bool": {
      "must": [{ "match": { "cuisine": "thai" } }],
      "filter": [{ "geo_distance": { "distance": "5km", "location": [37.77, -122.41] } }]
    }
  }
}
Performance numbers:
• Without geo-index: 5M distance calculations × 100ns = 500ms per query
• With geo-index: Retrieve ~500 restaurants in 5km radius in 10-20ms
• Speedup: 25-50×
Vector Index (Semantic Search)Embedding-Based Search for Intent Queries
Inverted index works for keyword match. But what about queries like "healthy dinner" or "romantic date"? These don't match specific keywords—they express intent. This requires semantic search.
How semantic search works:1. Encode: Convert query and restaurants into embedding vectors (dense 768-dim vectors)
2. Search: Find restaurants with embeddings nearest to query embedding (cosine similarity)
3. Rank: Return top-K nearest neighbors as results
Example:Query: "healthy dinner"
→ Embedding: [0.2, -0.5, 0.8, ..., 0.3] (768 dimensions)
Restaurants with similar embeddings:
• "Sweetgreen" (salads, grain bowls) → cosine_similarity = 0.89
• "Protein Bar" (smoothies, wraps) → cosine_similarity = 0.85
• "Chipotle" (burritos) → cosine_similarity = 0.62 (less relevant)
The indexing challenge:• Naive: Compare query embedding to all 5M restaurant embeddings → 5M cosine similarity calculations
• Math: 5M × 768 dims × 10 ops/dim = 38 billion operations per query → ~500ms
• Problem: Too slow for production (need <100ms)
Solution: Approximate Nearest Neighbor (ANN) SearchHNSW (Hierarchical Navigable Small World)• How it works: Build graph where each node is a restaurant embedding, edges connect similar nodes
• Search: Navigate graph via greedy routing—jump to nearest neighbor at each hop
• Performance: Log(N) hops to find approximate nearest neighbors
• Tradeoff: 95-99% recall (misses some true nearest neighbors) but 100× faster
• Latency: 20-40ms to retrieve top-100 from 5M restaurants
Alternative: Product Quantization (PQ)• How it works: Compress embeddings from 768 floats to small codes (8-16 bytes)
• Benefit: Fit entire index in memory (5M × 16 bytes = 80MB vs 5M × 768 × 4 bytes = 15GB)
• Tradeoff: Lossy compression → slightly lower recall
• Use case: Memory-constrained environments
Technology choices:• Pinecone / Weaviate / Milvus: Managed vector databases (HNSW + compression)
• FAISS (Facebook): Open-source library for ANN search
• ElasticSearch 8+: Native vector search support (convenient if already using ES)
When to use vector search:
• Good for: Intent queries ("healthy", "romantic"), semantic similarity, cold-start (new items)
• Not good for: Exact matches ("Chipotle"), structured filters (cuisine=Thai)
• Best practice: Use inverted index for keyword queries, vector search for semantic queries
• Hybrid: Combine both—keyword for initial filtering, embeddings for re-ranking
Index Sharding StrategyScaling Indexes Horizontally
5M restaurants × 10KB metadata = 50GB index. Too large for single machine. Need to shard(partition) index across multiple nodes.
Sharding strategies:1. Geographic Sharding (Best for DoorDash)• Partition: Shard by city or region
• Example: SF shard (50K restaurants), NYC shard (80K restaurants), ...
• Query routing: User in SF → route to SF shard only
• Pro: Queries only search local shard (fast), natural data locality
• Con: Imbalanced shards (NYC >> small town)
• Solution: Dynamic sharding—split large shards, merge small shards
2. Hash-Based Sharding• Partition: hash(restaurant_id) % num_shards → shard_id
• Pro: Perfectly balanced shards
• Con: Must query all shards (can't route by geography) → higher latency
• Use case: When queries aren't geo-localized (e.g., global search)
3. Hybrid: Geo + Hash• Partition: First by region (10 regions), then hash within region
• Example: SF region has 3 shards (hash-distributed for balance)
• Pro: Combines geo-locality + load balancing
• Con: More complex routing logic
Replication for availability:• Strategy: 3 replicas per shard (1 primary, 2 replicas)
• Reads: Distribute across replicas (load balancing)
• Writes: Primary handles writes, async replication to replicas
• Failover: If primary dies, promote replica to primary
Staff-level consideration:
• Sharding is not just about capacity—it's about isolating failures
• If one shard goes down, only queries to that region fail (not entire system)
• Geographic sharding provides natural failure isolation (NYC shard failure doesn't affect SF)
Key Takeaway: Index Choice Drives System Design
Staff-level engineers understand that index choice is not a backend implementation detail—it fundamentally shapes the system's capabilities, latency profile, and failure modes. Inverted index enables keyword search. Geo-spatial index enables location filtering. Vector index enables semantic search. Each has different performance characteristics and scaling requirements. The art is knowing when to use each and how to combine them.
5. Multi-Stage Retrieval PipelineNow that we have our indexes built, let's discuss how to actually retrieve restaurants efficiently. The core insight: retrieval is a funnel—start broad, progressively narrow. Each stage trades recall for precision and speed.
Stage 1: Candidate Generation (Fast & Broad)Goal: 5M → 500 candidates in <20ms
Objective: Quickly filter to restaurants that could be relevant. High recall is critical—if we miss the perfect restaurant here, it's gone forever.
Filtering strategy:1. Geographic filter: Within 5km radius (hard constraint)
• Query: geo_distance filter on ElasticSearch
• Result: 5M → ~10,000 restaurants (depends on city density)
• Latency: 5-10ms
2. Availability filter: Currently open and accepting orders
• Filter: is_open=true AND is_accepting_orders=true
• Result: 10,000 → ~6,000 restaurants (40% closed during off-hours)
• Latency: Negligible (indexed boolean field)
3. Keyword match: cuisine, name, or menu items match query
• Query: "thai" → match against cuisine, name, menu_items fields
• Result: 6,000 → ~500 restaurants
• Latency: 5-10ms (inverted index lookup)
ElasticSearch query structure:{
  "query": {
    "bool": {
      "must": [
        { "multi_match": { "query": "thai", "fields": ["name^5", "cuisine^3", "menu_items^2"] } }
      ],
      "filter": [
        { "geo_distance": { "distance": "5km", "location": [37.77, -122.41] } },
        { "term": { "is_open": true } },
        { "term": { "is_accepting_orders": true } }
      ]
    }
  },
  "size": 500
}
Why 500 candidates?
• Too few (e.g., 50): Risk missing relevant restaurants → low recall
• Too many (e.g., 2000): Next stages become slow → high latency
• 500 is sweet spot: High recall (95%+) but manageable for Stage 2
• Tuning: Monitor recall at Stage 1—if <95%, increase candidate count
Stage 2: Semantic Matching (Medium Precision)Goal: 500 → 100 relevant candidates in <30ms
Objective: Apply more expensive semantic matching to narrow to truly relevant restaurants.
When semantic matching helps:• Query: "spicy thai" (not just cuisine, but preference)
• Stage 1 returned all Thai restaurants (500)
• Stage 2 narrows to Thai restaurants with spicy dishes (100)
Approach: Menu-level matching1. Extract query intent: "spicy" is a preference, not cuisine
2. Search menu items: Check if restaurant has menu items matching "spicy"
3. Score: Count matches + TF-IDF weight
• Restaurant A: 10 spicy dishes → score = 10 × tf_idf("spicy") = 2.5
• Restaurant B: 1 spicy dish → score = 1 × tf_idf("spicy") = 0.25
• Restaurant A ranks higher
Alternative: Embedding-based matching (for vague queries)• Query: "healthy dinner" (no specific keywords)
• Compute query embedding: embed("healthy dinner") → 768-dim vector
• Retrieve restaurants with similar embeddings from vector DB
• Result: Salad places, grain bowl restaurants rank high (semantic similarity)
Hybrid approach (production):1. For keyword queries ("thai", "pizza"): Use inverted index (fast)
2. For intent queries ("healthy", "romantic"): Use embedding search (slower but necessary)
3. For mixed queries ("spicy thai"): Combine both—keyword filter + semantic rerank
Latency breakdown:
• Menu text search (keyword): 10-15ms for 500 candidates
• Embedding similarity (ANN): 20-30ms for 500 candidates
• Optimization: Only apply embedding search if keyword search is ambiguous
Stage 3: Personalized Ranking (Slow & Precise)Goal: 100 → top 20 ranked in <40ms
Objective: Apply expensive ML model to rank remaining candidates with high precision. This is where personalization happens—same query, different users, different rankings.
Ranking model: LightGBM (Gradient Boosted Trees)• Input: 80+ features per (user, restaurant, context) tuple
• Output: P(user will order | restaurant shown)
• Model size: 300 trees × 6 depth = ~5MB (fits in memory)
• Inference: 0.4ms per restaurant × 100 restaurants = 40ms
Feature categories:1. User features (20):• Historical: favorite_cuisine, avg_order_value, orders_last_30d
• Preferences: price_sensitivity, avg_rating_given, dietary_restrictions
• Behavior: time_since_last_order, typical_order_time
2. Restaurant features (25):• Quality: rating, num_reviews, review_velocity
• Operational: avg_prep_time, acceptance_rate, cancellation_rate
• Popularity: orders_last_hour, orders_last_day, trending_score
3. Interaction features (15):• Geography: distance_km, is_in_preferred_area
• Affinity: cuisine_match, price_match, has_favorite_items
• History: has_ordered_before, last_order_date, avg_order_value_here
4. Context features (20):• Time: hour_of_day, day_of_week, is_meal_time
• Weather: is_raining, temperature
• Events: is_holiday, local_events_nearby, traffic_level
Feature computation:• Precomputed (from feature store): User preferences, restaurant stats (99% of features)
• Real-time (computed per request): Distance, time context, current weather (1% of features)
• Why this split: Precomputing is expensive—do once daily, cache in Redis
Why LightGBM over neural networks?
• Speed: 40ms for 100 restaurants vs 150ms for deep neural net
• Interpretability: Can inspect feature importance, debug easily
• Tabular data: GBDTs excel at structured features (user features, restaurant metadata)
• Tradeoff: Neural nets can learn complex interactions but not worth latency cost
• When to use neural nets: If you need to process raw images/text (e.g., menu photos)
Putting It All Together: Request FlowComplete retrieval pipeline timing
1. Query understanding (parse + NER + expansion)10-15ms
2. Stage 1: Candidate generation (geo + keyword filter)15-20ms
3. Stage 2: Semantic matching (menu search / embeddings)20-30ms
4. Feature lookup (Redis batch MGET for 100 restaurants)5ms
5. Stage 3: ML ranking (LightGBM inference)40ms
Total latency (p50):90-110ms ✓
Total latency (p99, with cache misses):150-180ms ✓
Staff-Level Insight: The Funnel is Non-Negotiable
Junior engineers try to optimize by running one expensive model on all 5M restaurants. Staff engineers understand that's mathematically impossible at scale. The funnel architecture—cheap filters first, expensive ranking last—is the only way to hit sub-100ms latency. The art is tuning each stage: too aggressive filtering → low recall; too lenient → high latency. Monitor recall at each stage and iterate.
6. Ranking & Relevance ScoringRanking is where machine learning meets business logic. The goal: order results so the most relevant restaurants appear first. But "relevance" is multi-dimensional—it's not just text match, it's also quality, distance, personalization, and business objectives (promotions, diversity).
7. Personalization LayerPersonalization is the difference between generic search (same results for everyone) and great search (tailored to individual preferences). This section covers how to build and serve personalized features without sacrificing latency.
Final Thoughts
Building search systems at DoorDash scale requires understanding the full stack—from query parsing to indexing to ranking to infrastructure. The key principles: start with clear requirements, design for latency from day one, use multi-stage architectures, and always measure impact on business metrics. Great search systems aren't built overnight—they're iterated over months, informed by real user behavior and rigorous A/B testing.
This is staff-level work: designing systems that scale, anticipating failure modes, making informed tradeoffs, and building for maintainability over years. Master these principles and you'll be able to design search systems for any domain—not just food delivery.
← back to blog