MongoDB
Designing a Privacy-First Data Architecture
Build a privacy-by-design data system — covering data minimization, anonymization, right-to-erasure, consent tracking, and GDPR compliance.
S
srikanthtelkalapally888@gmail.com
Designing a Privacy-First Data Architecture
Privacy-by-design builds data protection into systems from the start, not as an afterthought.
Privacy Principles
1. Data minimization: Collect only what you need
2. Purpose limitation: Use data only for stated purpose
3. Storage limitation: Delete when no longer needed
4. Accuracy: Keep data up to date
5. Security: Protect against unauthorized access
6. Accountability: Document compliance
Data Classification
Level 1 - Public:
Name, profile picture (publicly shared)
Level 2 - Internal:
Email, username, preferences
Level 3 - Confidential:
Payment info, addresses, phone numbers
Level 4 - Sensitive:
Health data, biometrics, financial details
→ Highest protection, most restricted access
Consent Management
consents(
user_id, purpose, -- analytics, marketing, profiling
version, -- Consent version (for updates)
given_at, given_via, -- Banner, checkout, settings
withdrawn_at,
ip_address -- Proof of consent
)
Check before any data use:
SELECT COUNT(*) FROM consents
WHERE user_id = ? AND purpose = 'analytics'
AND withdrawn_at IS NULL;
Right to Erasure (GDPR Article 17)
User requests deletion:
1. Mark user as deleted (soft delete)
2. Queue erasure job
Erasure job:
3. Anonymize PII: name→"DELETED", email→hash
4. Delete associated records per retention policy
5. Purge from all caches
6. Cascade to partner systems (vendors, analytics)
7. Log completion (without the deleted PII)
Data Anonymization
def anonymize_user(user_id):
db.execute("""
UPDATE users SET
email = SHA256(email), -- Pseudonymization
name = 'ANONYMIZED',
phone = NULL,
dob = NULL,
ip_address = NULL
WHERE id = ?
""", user_id)
# Analytics: Replace user_id with random token
analytics.replace_user_id(user_id, uuid4())
Data Retention Policies
Active accounts: Keep during relationship
Inactive accounts: Delete after 2 years
Transaction records: Keep 7 years (legal requirement)
Logs: 30 days (rotate)
Backups: Apply deletion to backups within 30 days
Privacy-Preserving Analytics
Differential Privacy:
Add calibrated noise to aggregate queries
User data not identifiable in aggregate stats
Used by Apple, Google Chrome
K-anonymity:
Each record indistinguishable from K-1 others
Suppress rare combinations
Aggregation only:
Never expose raw user data
Only show counts when group > 100 users
Privacy Impact Assessment
For new features, evaluate:
What data collected?
Why is it necessary?
How long retained?
Who has access?
What are the risks?
What mitigations exist?
Conclusion
Privacy-first architecture requires consent management, automated erasure, data minimization, and anonymization built into core systems — not bolted on later.