MongoDB

Designing a Workflow Orchestration System

Build a workflow engine for complex business processes — covering DAG execution, step dependencies, human approvals, retries, and long-running workflows.

S

srikanthtelkalapally888@gmail.com

Designing a Workflow Orchestration System

A workflow orchestration system executes complex multi-step processes with dependencies, approvals, and error handling.

Use Cases

Order fulfillment: Validate → Reserve → Charge → Ship → Notify
HR onboarding: Create accounts → Send equipment → Schedule orientation
Loan processing: Application → Verification → Risk assessment → Approval
CI/CD pipeline: Build → Test → Deploy → Verify → Notify

Workflow Definition (DAG)

{
  "workflow_id": "order_fulfillment",
  "steps": [
    { "id": "validate",   "type": "service_call", "depends_on": [] },
    { "id": "reserve",    "type": "service_call", "depends_on": ["validate"] },
    { "id": "charge",     "type": "service_call", "depends_on": ["reserve"] },
    { "id": "ship",       "type": "service_call", "depends_on": ["charge"] },
    { "id": "notify",     "type": "notification", "depends_on": ["ship"] }
  ]
}

Execution Engine

def execute_workflow(workflow_id, instance_id, input):
  steps = get_steps(workflow_id)
  completed = set()
  running = {}

  while not all_done(steps, completed):
    for step in steps:
      if step.id in completed: continue
      if all deps in completed:  # Ready to run
        running[step.id] = executor.submit(step, input)

    for step_id, future in running.items():
      if future.done():
        completed.add(step_id)
        save_output(instance_id, step_id, future.result())

Parallel Execution

Step A (10s)
Step B (15s)  → Run in parallel → Step D
Step C (5s)

Total: 15s (not 30s sequential)

Human Approval Steps

Step: loan_approval
Type: human_task
Assignee: role:credit_manager
Timeout: 48 hours

Execution:
  1. Create task in task inbox
  2. Notify credit manager
  3. Wait (workflow suspended)
  4. Manager approves/rejects
  5. Workflow resumes

Error Handling

Step fails:
  Retry: 3 attempts with exponential backoff
  Timeout: 30 minutes per step
  On failure: Execute compensation steps (undo)
  Alert: Notify on-call
  Suspend: Manual intervention

Conclusion

Workflow engines (Temporal, AWS Step Functions, Camunda) solve the complexity of long-running, multi-step processes with durability, retries, and human tasks.

Share this article