MongoDB
Designing a Workflow Orchestration System
Build a workflow engine for complex business processes — covering DAG execution, step dependencies, human approvals, retries, and long-running workflows.
S
srikanthtelkalapally888@gmail.com
Designing a Workflow Orchestration System
A workflow orchestration system executes complex multi-step processes with dependencies, approvals, and error handling.
Use Cases
Order fulfillment: Validate → Reserve → Charge → Ship → Notify
HR onboarding: Create accounts → Send equipment → Schedule orientation
Loan processing: Application → Verification → Risk assessment → Approval
CI/CD pipeline: Build → Test → Deploy → Verify → Notify
Workflow Definition (DAG)
{
"workflow_id": "order_fulfillment",
"steps": [
{ "id": "validate", "type": "service_call", "depends_on": [] },
{ "id": "reserve", "type": "service_call", "depends_on": ["validate"] },
{ "id": "charge", "type": "service_call", "depends_on": ["reserve"] },
{ "id": "ship", "type": "service_call", "depends_on": ["charge"] },
{ "id": "notify", "type": "notification", "depends_on": ["ship"] }
]
}
Execution Engine
def execute_workflow(workflow_id, instance_id, input):
steps = get_steps(workflow_id)
completed = set()
running = {}
while not all_done(steps, completed):
for step in steps:
if step.id in completed: continue
if all deps in completed: # Ready to run
running[step.id] = executor.submit(step, input)
for step_id, future in running.items():
if future.done():
completed.add(step_id)
save_output(instance_id, step_id, future.result())
Parallel Execution
Step A (10s)
Step B (15s) → Run in parallel → Step D
Step C (5s)
Total: 15s (not 30s sequential)
Human Approval Steps
Step: loan_approval
Type: human_task
Assignee: role:credit_manager
Timeout: 48 hours
Execution:
1. Create task in task inbox
2. Notify credit manager
3. Wait (workflow suspended)
4. Manager approves/rejects
5. Workflow resumes
Error Handling
Step fails:
Retry: 3 attempts with exponential backoff
Timeout: 30 minutes per step
On failure: Execute compensation steps (undo)
Alert: Notify on-call
Suspend: Manual intervention
Conclusion
Workflow engines (Temporal, AWS Step Functions, Camunda) solve the complexity of long-running, multi-step processes with durability, retries, and human tasks.