MongoDB

Designing a Privacy-First Data Architecture

Build a privacy-by-design data system — covering data minimization, anonymization, right-to-erasure, consent tracking, and GDPR compliance.

S

srikanthtelkalapally888@gmail.com

Designing a Privacy-First Data Architecture

Privacy-by-design builds data protection into systems from the start, not as an afterthought.

Privacy Principles

1. Data minimization: Collect only what you need
2. Purpose limitation: Use data only for stated purpose
3. Storage limitation: Delete when no longer needed
4. Accuracy: Keep data up to date
5. Security: Protect against unauthorized access
6. Accountability: Document compliance

Data Classification

Level 1 - Public:
  Name, profile picture (publicly shared)

Level 2 - Internal:
  Email, username, preferences

Level 3 - Confidential:
  Payment info, addresses, phone numbers

Level 4 - Sensitive:
  Health data, biometrics, financial details
  → Highest protection, most restricted access

Consent Management

consents(
  user_id, purpose,     -- analytics, marketing, profiling
  version,              -- Consent version (for updates)
  given_at, given_via,  -- Banner, checkout, settings
  withdrawn_at,
  ip_address            -- Proof of consent
)

Check before any data use:
SELECT COUNT(*) FROM consents
WHERE user_id = ? AND purpose = 'analytics'
  AND withdrawn_at IS NULL;

Right to Erasure (GDPR Article 17)

User requests deletion:
1. Mark user as deleted (soft delete)
2. Queue erasure job

Erasure job:
3. Anonymize PII: name→"DELETED", email→hash
4. Delete associated records per retention policy
5. Purge from all caches
6. Cascade to partner systems (vendors, analytics)
7. Log completion (without the deleted PII)

Data Anonymization

def anonymize_user(user_id):
  db.execute("""
    UPDATE users SET
      email = SHA256(email),  -- Pseudonymization
      name = 'ANONYMIZED',
      phone = NULL,
      dob = NULL,
      ip_address = NULL
    WHERE id = ?
  """, user_id)

  # Analytics: Replace user_id with random token
  analytics.replace_user_id(user_id, uuid4())

Data Retention Policies

Active accounts: Keep during relationship
Inactive accounts: Delete after 2 years
Transaction records: Keep 7 years (legal requirement)
Logs: 30 days (rotate)
Backups: Apply deletion to backups within 30 days

Privacy-Preserving Analytics

Differential Privacy:
  Add calibrated noise to aggregate queries
  User data not identifiable in aggregate stats
  Used by Apple, Google Chrome

K-anonymity:
  Each record indistinguishable from K-1 others
  Suppress rare combinations

Aggregation only:
  Never expose raw user data
  Only show counts when group > 100 users

Privacy Impact Assessment

For new features, evaluate:
  What data collected?
  Why is it necessary?
  How long retained?
  Who has access?
  What are the risks?
  What mitigations exist?

Conclusion

Privacy-first architecture requires consent management, automated erasure, data minimization, and anonymization built into core systems — not bolted on later.

Share this article