LSM Trees vs B-Trees in Database Storage Engines
Compare Log-Structured Merge Trees and B-Trees — understanding write amplification, read amplification, and when each engine excels.
srikanthtelkalapally888@gmail.com
LSM Trees vs B-Trees
Storage engines determine how databases physically read and write data. B-Trees and LSM Trees represent two fundamentally different approaches.
B-Tree
In-place updates on a balanced tree of fixed-size pages.
Write: Find page → update in place → write entire page
Read: Traverse tree → O(log N)
Used by: PostgreSQL, MySQL (InnoDB), SQLite
Strengths:
- Fast reads: O(log N)
- Good for random reads
- Predictable performance
Weaknesses:
- Write amplification: Small write → full page rewrite
- Fragmentation over time
LSM Tree (Log-Structured Merge Tree)
Append-only writes, periodic merging.
Write Path
Write → MemTable (in-memory)
↓ (when full, ~64MB)
SSTable (immutable file on disk)
↓ (periodic compaction)
Merged SSTables (sorted, compact)
Read Path
Check MemTable
↓ (miss)
Check L0 SSTables (newest first)
↓ (miss)
Check L1, L2... (older levels)
Used by: Cassandra, HBase, RocksDB, LevelDB
Amplification Factors
Write Amplification:
B-Tree: ~3-10x (WAL + page write + metadata)
LSM Tree: ~10-30x (compaction rewrites data multiple times)
Read Amplification:
B-Tree: ~3-5 (page traversals)
LSM Tree: ~10+ (check multiple levels)
Space Amplification:
B-Tree: ~2x (fragmentation)
LSM Tree: ~1.1x (compaction reclaims space)
Bloom Filters in LSM
Each SSTable has a Bloom filter → skip SSTables that don't contain key → reduces read amplification significantly.
When to Choose
B-Tree: Read-heavy, random access, range scans
LSM Tree: Write-heavy, time-series, append-dominant
Conclusion
LSM Trees dominate write-heavy workloads (Cassandra, Kafka). B-Trees dominate read-heavy transactional workloads (PostgreSQL, MySQL).