MongoDB
Designing a Data Warehouse
Build a cloud data warehouse using dimensional modeling, ETL/ELT pipelines, star schemas, and OLAP query optimization for business intelligence.
S
srikanthtelkalapally888@gmail.com
Designing a Data Warehouse
A data warehouse is a centralized repository for analytical data, optimized for complex queries across large datasets.
OLTP vs OLAP
OLTP (Operational):
Purpose: Run the business
Queries: Simple, row-level
Tables: Normalized (3NF)
Example: INSERT order, UPDATE status
OLAP (Analytical):
Purpose: Understand the business
Queries: Complex aggregations
Tables: Denormalized (star schema)
Example: Revenue by region by month
Star Schema
dim_date
│
dim_product──fact_sales──dim_customer
│
dim_store
fact_sales:
sale_id, product_id, customer_id,
store_id, date_id, quantity, revenue
Fact tables: Measurements (revenue, quantity) Dimension tables: Context (who, what, when, where)
ETL vs ELT
ETL (Traditional):
Extract → Transform → Load into warehouse
ELT (Modern, cloud):
Extract → Load raw → Transform inside warehouse
(Warehouse compute is cheap; simpler pipelines)
Modern Data Stack
Sources: PostgreSQL, Salesforce, Stripe
↓
Ingestion: Fivetran / Airbyte
↓
Warehouse: Snowflake / BigQuery / Redshift
↓
Transform: dbt (SQL models)
↓
BI Tools: Tableau / Looker / Metabase
Partitioning in Warehouses
-- BigQuery partition by date
CREATE TABLE sales
PARTITION BY DATE(sale_date)
OPTIONS (partition_expiration_days = 365);
-- Query scans only relevant partitions
SELECT SUM(revenue)
FROM sales
WHERE sale_date BETWEEN '2026-01-01' AND '2026-03-01';
Columnar Storage
Warehouse stores data column-by-column:
Query: SELECT SUM(revenue) FROM sales
→ Only reads revenue column (not all columns)
→ 10-100x less I/O
Conclusion
Star schema + ELT + columnar storage is the modern warehouse architecture. dbt for transformations, Snowflake/BigQuery for compute, Fivetran for ingestion.