Skip to content

Latest commit

 

History

History
52 lines (32 loc) · 1.08 KB

section_google.md

File metadata and controls

52 lines (32 loc) · 1.08 KB

What would Google do?

Web search for a planet: The Google cluster architecture (2003)


Basic assumptions

  • Search can be parallelized
  • High-end hardware fails too
  • Throughput > performance
  • Queries per second > Seconds per query

  • Rely on software, not hardware
  • Use commodity hardware

Parallelize

  • Focus on parallel queries
  • Achieve high throughput
  • Parallelize at CPU or Cluster level
  • 10s query parallelized on 10 nodes takes 1s

Notes:


Query time

Google cluster

Source

  • Index servers: Inverted Index, document-partitioned
  • Document servers: Copy of entire internet

Notes:


Query time

  1. Query index servers → doc IDs
  2. Merge doc IDs → result set
  3. Fetch doc IDs from doc servers → (Title, URL, result snippet)
  4. Generate search result page

Notes: