MongoDB
Designing a Columnar Storage Engine
Understand how columnar storage works — row vs column layout, run-length encoding, dictionary encoding, predicate pushdown, and vectorized execution.
S
srikanthtelkalapally888@gmail.com
Columnar storage is the key technology behind modern analytical databases — enabling 10-100x faster queries through compression and I/O reduction.
Row vs Columnar Layout
Table: sales(id, date, product, region, revenue)
Row Storage (PostgreSQL):
Disk: [1,2026-01-01,Shoes,US,99.99][2,2026-01-01,Hat,EU,24.99]...
Query: SELECT SUM(revenue) → Must read ALL columns
Columnar Storage (Parquet, ClickHouse):
Disk: [id:1,2,3...][date:2026-01-01,...][revenue:99.99,24.99,...]...
Query: SELECT SUM(revenue) → Reads ONLY revenue column
I/O Reduction
Table: 100 columns, 1B rows, 1KB per row = 1PB total
Query touches 3 columns:
Row store: Read 1PB → Extract 3 columns
Column store: Read 30GB (3 × 10GB columns)
→ 33x less I/O
Compression Techniques
Run-Length Encoding (RLE)
Original: [US, US, US, US, EU, EU, EU]
RLE: [(US, 4), (EU, 3)]
→ 7 values → 2 entries (with counts)
Dictionary Encoding
Original: ["Electronics", "Electronics", "Clothing", "Electronics"]
Dictionary: { 0: "Electronics", 1: "Clothing" }
Encoded: [0, 0, 1, 0]
→ Store tiny integers instead of strings
Delta Encoding
Timestamps: [1000, 1005, 1010, 1015]
Deltas: [1000, +5, +5, +5]
→ Smaller values compress better
Predicate Pushdown
Query: SELECT revenue WHERE region = 'US'
With statistics per column chunk:
Chunk A: region min='AU', max='US' → MAY contain US
Chunk B: region min='EU', max='EU' → SKIP (no US)
Skip entire chunks based on min/max statistics
→ Zone maps / bloom filters per chunk
Vectorized Execution
Traditional (row at a time):
for each row: apply predicate
Vectorized (batch processing):
Process 1024 values at once using SIMD instructions
→ CPU cache friendly
→ 2-10x faster than row-at-a-time
Parquet File Format
Parquet structure:
File → Row Groups (~128MB)
↓
Column Chunks
↓
Pages (data + dictionary + index)
↓
Footer (statistics, schema)
Conclusion
Columnar storage + compression + predicate pushdown + vectorized execution combine to deliver OLAP query performance at petabyte scale. Parquet is the universal columnar format.