MongoDB
Designing an IoT Data Platform
Build an IoT data ingestion and processing platform handling millions of device telemetry streams — covering MQTT, time-series storage, and alerting.
S
srikanthtelkalapally888@gmail.com
Designing an IoT Data Platform
An IoT platform ingests, processes, and stores telemetry from millions of connected devices.
Scale
10M devices × 1 message/10 sec = 1M messages/sec
Each message: 500 bytes
Ingestion: 500 MB/sec
Storage: 43 TB/day
Device Connectivity
MQTT Protocol
Lightweight pub/sub protocol designed for IoT:
QoS 0: At-most-once (fire and forget)
QoS 1: At-least-once (acknowledgment)
QoS 2: Exactly-once (4-way handshake)
Topic: devices/{device_id}/telemetry
Payload: { temp: 23.5, humidity: 65, ts: 1709... }
MQTT Broker
Devices → MQTT Broker (EMQX / AWS IoT Core)
↓
Kafka
↓
Processing Pipeline
Processing Pipeline
Kafka (raw telemetry)
↓
Flink / Spark Streaming:
- Data validation / enrichment
- Anomaly detection
- Downsampling (1sec → 1min averages)
↓
Time-Series DB
↓
Dashboard API
Time-Series Storage
InfluxDB / TimescaleDB / Apache IoTDB:
Measurement: temperature
Tags: device_id, location, type
Fields: value, unit
Timestamp: nanosecond precision
-- Query last 1 hour
SELECT mean(value)
FROM temperature
WHERE device_id='d123' AND time > now() - 1h
GROUP BY time(1m)
Downsampling
Raw data (1s resolution): keep 7 days
Downsampled (1m avg): keep 90 days
Downsampled (1h avg): keep 2 years
Downsampled (1d avg): keep forever
Auto-downsample with continuous queries
Alerting
Rule: IF temperature > 80°C for 5 min
→ Send alert to on-call engineer
→ Execute action (shut down device)
Stream processing evaluates rules per device stream
Alerts via PagerDuty / OpsGenie
Device Management
Device Registry: Metadata, firmware version, status
OTA Updates: Push firmware to device groups
Device Shadow: Last known state for offline devices
Conclusion
IoT platforms ingest via MQTT, process via Kafka+Flink, and store in time-series databases with aggressive downsampling for long-term storage efficiency.