MongoDB

Designing a Document Storage and Collaboration System

Build a real-time collaborative document editor like Google Docs — covering operational transformation, CRDT, conflict resolution, and sync.

S

srikanthtelkalapally888@gmail.com

Designing a Collaborative Document Editor

Real-time collaborative editors allow multiple users to edit the same document simultaneously.

The Core Challenge

User A (at pos 5): Insert 'X'
User B (at pos 5): Insert 'Y' (simultaneously)

Without coordination:
  Server sees: Insert X at 5, Insert Y at 5
  Result: Random order, inconsistent across clients

Operational Transformation (OT)

Transform operations against concurrent operations:

A's op: Insert('X', pos=5)
B's op: Insert('Y', pos=5)

Server receives A first, applies it.
Then transforms B's op against A:
  → B's op becomes: Insert('Y', pos=6)
  Result: ...XY... (consistent)

Used by: Google Docs, Etherpad

CRDT (Conflict-free Replicated Data Types)

Data structure that merges automatically:

Each character has unique ID (site_id + local_clock)
Characters ordered by ID, not position

Insert 'X' → Create char{id: A:5, val: 'X', before: B:3, after: C:6}
Insert 'Y' → Create char{id: B:5, val: 'Y', before: B:3, after: C:6}

Merge: Sort by ID → Deterministic order on all clients

Used by: Figma, Notion

Architecture

Client A → WebSocket → Server
                          ↓
                   OT/CRDT Engine
                          ↓
               Broadcast to all clients
                          ↓
              Persist to Document Store

Document Storage

Document: { id, title, content, version }
Operations log: { doc_id, op, user_id, version, timestamp }

Reconstruct document by replaying ops from version 0
Or store snapshots every 1000 ops

Presence (Who's editing)

Each client sends cursor position every 500ms
    ↓
Server broadcasts cursor positions to all
    ↓
Other clients show colored cursors
    ↓
Redis: { doc_id: { user_a: {pos: 42}, user_b: {pos: 105} } }
TTL: 5 seconds (remove stale cursors)

Conclusion

CRDT is now preferred over OT for simplicity and offline support. Store operation logs for audit trail and reconstruct from snapshots for performance.

Share this article