A staff-level deep dive into building distributed search infrastructure at scale
Building search systems at scale is one of the most challenging problems in distributed systems engineering. When users type "thai food near me" into DoorDash, they expect instant, relevant results—even when the system is handling millions of queries per second across hundreds of cities and millions of restaurants.
This case study walks through designing a production-grade search system from first principles. We'll cover the full stack: query understanding, indexing strategies, ranking algorithms, caching layers, and operational considerations. This is staff-level work—the kind of system design that requires understanding both the theoretical foundations and the practical constraints of running large-scale infrastructure.
The Challenge: Design a search system for DoorDash that can handle:
Before diving into solutions, let's decompose the problem systematically. Search systems have fundamentally different characteristics than ranking systems, and understanding these nuances is critical for staff-level design.
Query Types to Support
Query Complexity Handling
Staff-Level Insight: The Latency-Relevance Tradeoff
The fundamental tension in search systems: more comprehensive search (fuzzy matching, semantic search, personalization) improves relevance but increases latency. At DoorDash scale, we can't afford to search all 5M restaurants per query. This drives the multi-stage architecture we'll design: fast, broad retrieval followed by expensive, precise ranking.
A production search system is not a monolith—it's a composition of specialized components, each optimized for specific constraints. Let's start with the 30,000-foot view before diving into each component.
Search Request Flow (End-to-End)
Key Architectural Principle: Separation of Concerns
Each component has a single responsibility: query understanding doesn't do ranking, indexing doesn't do personalization. This enables independent scaling, testing, and iteration. When ranking needs improvement, we update the ranking service without touching the index. This is staff-level design—building systems that are maintainable over years, not months.
Query understanding is the first—and often most underestimated—step in the search pipeline. Poor query parsing means downstream stages search for the wrong thing, no matter how good your indexing or ranking is. This is where many junior systems fail: they treat search as a simple keyword match when it's actually a natural language understanding problem.
Step 1: Tokenization & Normalization
Goal: Convert raw query string into clean, normalized tokens
Step 2: Intent Classification
Goal: Determine what the user is trying to do (search types)
Step 3: Named Entity Recognition (NER)
Goal: Extract specific entities (restaurants, cuisines, dishes, locations)
Step 4: Query Expansion
Goal: Add synonyms and related terms to improve recall without hurting precision
Handling Typos & Misspellings
Tradeoff: Latency vs Recall
Staff-Level Insight: Query Understanding as ML Problem
Junior engineers treat query parsing as string manipulation. Staff engineers recognize it as a natural language understanding problem requiring ML. The best systems use hybrid approaches: fast rules for common cases, ML for long tail. The key is understanding when each approach is appropriate and measuring the impact on both latency and relevance.
Indexing is where theory meets practice. You can have the best ranking algorithm in the world, but if your index can't retrieve candidates fast enough, your search is too slow. At scale, data structure choice matters more than algorithmic cleverness. This section covers the core indexes we need and why.
What is an Inverted Index?
Concept: Map from term → list of documents containing that term
Multi-Field Indexing
Why Field Weighting Matters
Why Geo-Spatial Indexing is Critical
Location is a hard constraint in food delivery—can't deliver from 100 miles away. We need to filter millions of restaurants to hundreds within delivery radius before doing expensive operations.
Performance numbers:
Embedding-Based Search for Intent Queries
Inverted index works for keyword match. But what about queries like "healthy dinner" or "romantic date"? These don't match specific keywords—they express intent. This requires semantic search.
When to use vector search:
Scaling Indexes Horizontally
5M restaurants × 10KB metadata = 50GB index. Too large for single machine. Need to shard(partition) index across multiple nodes.
Staff-level consideration:
Key Takeaway: Index Choice Drives System Design
Staff-level engineers understand that index choice is not a backend implementation detail—it fundamentally shapes the system's capabilities, latency profile, and failure modes. Inverted index enables keyword search. Geo-spatial index enables location filtering. Vector index enables semantic search. Each has different performance characteristics and scaling requirements. The art is knowing when to use each and how to combine them.
Now that we have our indexes built, let's discuss how to actually retrieve restaurants efficiently. The core insight: retrieval is a funnel—start broad, progressively narrow. Each stage trades recall for precision and speed.
Goal: 5M → 500 candidates in <20ms
Objective: Quickly filter to restaurants that could be relevant. High recall is critical—if we miss the perfect restaurant here, it's gone forever.
Why 500 candidates?
Goal: 500 → 100 relevant candidates in <30ms
Objective: Apply more expensive semantic matching to narrow to truly relevant restaurants.
Latency breakdown:
Goal: 100 → top 20 ranked in <40ms
Objective: Apply expensive ML model to rank remaining candidates with high precision. This is where personalization happens—same query, different users, different rankings.
Why LightGBM over neural networks?
Complete retrieval pipeline timing
Staff-Level Insight: The Funnel is Non-Negotiable
Junior engineers try to optimize by running one expensive model on all 5M restaurants. Staff engineers understand that's mathematically impossible at scale. The funnel architecture—cheap filters first, expensive ranking last—is the only way to hit sub-100ms latency. The art is tuning each stage: too aggressive filtering → low recall; too lenient → high latency. Monitor recall at each stage and iterate.
Ranking is where machine learning meets business logic. The goal: order results so the most relevant restaurants appear first. But "relevance" is multi-dimensional—it's not just text match, it's also quality, distance, personalization, and business objectives (promotions, diversity).
Personalization is the difference between generic search (same results for everyone) and great search (tailored to individual preferences). This section covers how to build and serve personalized features without sacrificing latency.
Final Thoughts
Building search systems at DoorDash scale requires understanding the full stack—from query parsing to indexing to ranking to infrastructure. The key principles: start with clear requirements, design for latency from day one, use multi-stage architectures, and always measure impact on business metrics. Great search systems aren't built overnight—they're iterated over months, informed by real user behavior and rigorous A/B testing.
This is staff-level work: designing systems that scale, anticipating failure modes, making informed tradeoffs, and building for maintainability over years. Master these principles and you'll be able to design search systems for any domain—not just food delivery.