Database

System Design Interview - Chapter 6 - Design a Key-Value store

Key-Value stores are the most basic but widely used data storages.

Design of key-value store consists of understanding the following topics:

What do we want from key-value store?
Single server key-value store
DISTRIBUTED key-value store:
- CAP theorem
- Real-world trade-offs for distributed systems
System components:
- Data partition
- Data replication
- Consistency
- Inconsistency resolution: Versioning
- Handling all types of failures: Failure detection, Handling TEMPORARY failures, Handling PERMANENT failures, Handling data center outage
System architecture diagram
Write path
Read path

These items are disclosed in a very interesting Chapter 6 of the book:

System Design Interview - Chapter 5 - Design Consistent Hashing

Consistent Hashing is a cornerstone technology for distributed systems. Many of software developers don’t realize it, but Consistent Hashing is needed in many places: load balancers, caches, CDNs, id generators, databases, chats / social networks, and many other systems.

This topic consists of:

Problem with rehashing and why we need hashing to be CONSISTENT
Hash space and hash ring
BASIC approach (introduced by Karger et al. at MIT)
Advanced approach with VIRTUAL NODES

These items are disclosed in a very interesting Chapter 5 of the book:

System Design Interview - Chapter 3 - A Framework for System Design Interviews

Four standard steps for system design interview. However, I would think about them wider: as about four initial steps to design the software.

Step 1. Understand the problem and establish design scope
Step 2. Propose high-level design and get buy-in
Step 3. Design deep dive
Step 4. Wrap up

The chapter 3 of the book discovers details about each step, good questions to ask (to think about), DO’s and DONT’s. It also shows good example of the process of designing a news feed system.

System Design Interview - Chapter 1 - Scale from zero to millions of users

A great generic plan for scaling any app from zero to millions of users.

Single server setup
Selection and usage of database
Vertical scaling vs horizontal scaling approaches. And why you should prefer horizontal
Adding load balancer for horizontal scaling
Adding database replication for horizontal scaling
Adding cache
Adding CDN
Stateless vs Stateful architecture and using external state storage
Adding extra Data Centers
Adding Message queue
Adding Logging, Metrics, and Automation
Scaling database (sharding)
and futher steps…

All of these is carefully but briefly disclosed in the Chapter 1 of the book:

The book club of our company has chosen a new wonderful book for reading:

Robert Martin - Clean Architecture - a Craftsman’s Guide to Software Structure and Design

👍

The part VI undermines some foundations 😀:

Do you know that Database is a “detail”? An unimportant minor low-level non-essential feature that can be neglected in architecture design!
Do you know the same about the Web? It is just an unimportant IO device that should also be neglected in architecture design!
What about Frameworks? The same. Don’t marry your framework. Use safe and better remote sex. 🤣
There are several examples of how similar architectures may or may not lead to problems. And what to use to avoid problems (spoiler: Encapsulation)
All of the above, and a brief missing advice…

Great information! All the details are in my mind maps:

Designing Data-Intensive Applications - Chapter 1 - Reliable, Scalable, and Maintainable Applications

Earlier this year the book club of our company has studied excellent book:

Martin Kleppmann - Designing Data-Intensive Applications

This is the best book I have read about building complex scalable software systems. 💪

As usually (to better learn) I prepared an overview and mind-map.

Chapter 1:

Building blocks of the apps
What is Reliability, Scalability and Maintainability. Examples and definitions.
- Faults and Failures
- Performance, Load, Latency and Response Time
- Operability, Simplicity, Evolvability
Why you should randomly kill your servers 😅
How Twitter delivers 12,000 tweets per second to 300,000 readers per second. (VERY interesting!)
How much money Amazon loses for each 100ms delay in their response time
How to quickly calculate percentiles for monitoring response time in PROD

Download full mind map (PDF)