MongoDB

Designing a Log-Structured File System

Understand log-structured file systems — append-only writes, garbage collection, segment cleaning, crash recovery, and their influence on modern databases.

srikanthtelkalapally888@gmail.com

March 15, 2026

Log-structured file systems write all changes sequentially to a log, enabling high write throughput and simple crash recovery.

Traditional File Systems Problem

Traditional (ext4, NTFS):
  Random writes scatter across disk
  Metadata + data written in separate locations
  Slow for write-heavy workloads
  Complex crash recovery (fsck)

Log-Structured Approach

All writes → Append to end of log (sequential)

Log: [inode][data][inode][data][checkpoint]...
         write1       write2

Sequential writes are orders of magnitude faster on HDD (no seek), and predictable on SSD.

Data Structure

Segment (1MB chunk):
  [segment_summary][file_data][inodes]

Log = Series of segments written sequentially

Inode Map:
  Maps inode_number → current disk location
  Cached in memory, checkpointed to disk

Write Path

1. Buffer writes in memory (write buffer ~1MB)
2. When full: write segment to disk (sequential)
3. Update in-memory inode map
4. Periodically checkpoint inode map to disk

Read Path

1. Lookup inode number → location in inode map
2. Read inode → data block locations
3. Read data blocks

Challenge: Data can be anywhere in log → Random reads
Solution: Read cache (LRU buffer cache)

Garbage Collection (Segment Cleaning)

Problem: Old versions of files remain in old log segments

Cleaning:
1. Select segments with most dead data
2. Read live blocks from segment
3. Write live blocks to new segment at end of log
4. Mark old segment free

Crash Recovery

Checkpoint:
  Periodically write full inode map to fixed locations
  Record checkpoint timestamp

Recovery:
  1. Find latest valid checkpoint
  2. Restore inode map from checkpoint
  3. Roll forward from checkpoint position in log
  4. Replay segments written after checkpoint

Influence on Modern Systems

LevelDB / RocksDB: LSM tree (log-structured)
Cassandra:         Log-structured storage
Postgres WAL:      Log-first writes
ZFS:               Copy-on-write (similar principle)
Kafka:             Append-only log (same idea!)

Conclusion

Log-structured file systems sacrifice read locality for write performance and simple recovery. Their core ideas — sequential writes, append-only logs, garbage collection — appear throughout modern databases and storage engines.

Share this article

← Back to all articles