MongoDB

Designing a Columnar Storage Engine

Understand how columnar storage works — row vs column layout, run-length encoding, dictionary encoding, predicate pushdown, and vectorized execution.

S

srikanthtelkalapally888@gmail.com

Columnar storage is the key technology behind modern analytical databases — enabling 10-100x faster queries through compression and I/O reduction.

Row vs Columnar Layout

Table: sales(id, date, product, region, revenue)

Row Storage (PostgreSQL):
  Disk: [1,2026-01-01,Shoes,US,99.99][2,2026-01-01,Hat,EU,24.99]...
  Query: SELECT SUM(revenue) → Must read ALL columns

Columnar Storage (Parquet, ClickHouse):
  Disk: [id:1,2,3...][date:2026-01-01,...][revenue:99.99,24.99,...]...
  Query: SELECT SUM(revenue) → Reads ONLY revenue column

I/O Reduction

Table: 100 columns, 1B rows, 1KB per row = 1PB total

Query touches 3 columns:
  Row store:    Read 1PB → Extract 3 columns
  Column store: Read 30GB (3 × 10GB columns)
  → 33x less I/O

Compression Techniques

Run-Length Encoding (RLE)

Original: [US, US, US, US, EU, EU, EU]
RLE:      [(US, 4), (EU, 3)]
→ 7 values → 2 entries (with counts)

Dictionary Encoding

Original: ["Electronics", "Electronics", "Clothing", "Electronics"]
Dictionary: { 0: "Electronics", 1: "Clothing" }
Encoded: [0, 0, 1, 0]
→ Store tiny integers instead of strings

Delta Encoding

Timestamps: [1000, 1005, 1010, 1015]
Deltas:     [1000, +5, +5, +5]
→ Smaller values compress better

Predicate Pushdown

Query: SELECT revenue WHERE region = 'US'

With statistics per column chunk:
  Chunk A: region min='AU', max='US' → MAY contain US
  Chunk B: region min='EU', max='EU' → SKIP (no US)

Skip entire chunks based on min/max statistics
→ Zone maps / bloom filters per chunk

Vectorized Execution

Traditional (row at a time):
  for each row: apply predicate

Vectorized (batch processing):
  Process 1024 values at once using SIMD instructions
  → CPU cache friendly
  → 2-10x faster than row-at-a-time

Parquet File Format

Parquet structure:
  File → Row Groups (~128MB)
         ↓
         Column Chunks
         ↓
         Pages (data + dictionary + index)
         ↓
         Footer (statistics, schema)

Conclusion

Columnar storage + compression + predicate pushdown + vectorized execution combine to deliver OLAP query performance at petabyte scale. Parquet is the universal columnar format.

Share this article