MongoDB

LSM Trees vs B-Trees in Database Storage Engines

Compare Log-Structured Merge Trees and B-Trees — understanding write amplification, read amplification, and when each engine excels.

S

srikanthtelkalapally888@gmail.com

LSM Trees vs B-Trees

Storage engines determine how databases physically read and write data. B-Trees and LSM Trees represent two fundamentally different approaches.

B-Tree

In-place updates on a balanced tree of fixed-size pages.

Write: Find page → update in place → write entire page
Read:  Traverse tree → O(log N)

Used by: PostgreSQL, MySQL (InnoDB), SQLite

Strengths:

  • Fast reads: O(log N)
  • Good for random reads
  • Predictable performance

Weaknesses:

  • Write amplification: Small write → full page rewrite
  • Fragmentation over time

LSM Tree (Log-Structured Merge Tree)

Append-only writes, periodic merging.

Write Path

Write → MemTable (in-memory)
          ↓ (when full, ~64MB)
       SSTable (immutable file on disk)
          ↓ (periodic compaction)
       Merged SSTables (sorted, compact)

Read Path

Check MemTable
    ↓ (miss)
Check L0 SSTables (newest first)
    ↓ (miss)
Check L1, L2... (older levels)

Used by: Cassandra, HBase, RocksDB, LevelDB

Amplification Factors

Write Amplification:
  B-Tree:   ~3-10x (WAL + page write + metadata)
  LSM Tree: ~10-30x (compaction rewrites data multiple times)

Read Amplification:
  B-Tree:   ~3-5 (page traversals)
  LSM Tree: ~10+ (check multiple levels)

Space Amplification:
  B-Tree:   ~2x (fragmentation)
  LSM Tree: ~1.1x (compaction reclaims space)

Bloom Filters in LSM

Each SSTable has a Bloom filter → skip SSTables that don't contain key → reduces read amplification significantly.

When to Choose

B-Tree:   Read-heavy, random access, range scans
LSM Tree: Write-heavy, time-series, append-dominant

Conclusion

LSM Trees dominate write-heavy workloads (Cassandra, Kafka). B-Trees dominate read-heavy transactional workloads (PostgreSQL, MySQL).

Share this article