Multi-step Processes
- Use Temporal or Step Functions for durable multi-step workflows that survive crashes and handle retries.
30-second elevator pitch: "Multi-step processes fail halfway - payment charged but inventory not reserved. I use workflow engines like Temporal or AWS Step Functions so each step is durable, failures trigger retries or compensation, and the system picks up exactly where it left off after a crash."
The Problem
Consider e-commerce order fulfillment: charge payment, reserve inventory, create shipping label, wait for pickup, send confirmation. Each step calls different services. Any step can fail or timeout. Your server might crash after charging payment but before reserving inventory. Now you have money but no reserved item.
Manually adding retries, state checkpoints, and compensation logic to each step makes the system brittle. Workflow systems solve this by design.
2 problems that use this pattern: Uber, Payment System.
What You Will Learn
Approaches
- Single-server orchestration (simple, no durability)
- Event sourcing (Kafka log, workers react to events)
- Durable execution (Temporal, Step Functions)
- Managed workflows (state machines, DAGs)
>
Deep Dives
- Workflow versioning when you add new steps
- Handling external events (user signs document in 5 days)
- Keeping workflow state size manageable
The Solution: From Simple to Durable
What interviewers want to hear: "For simple flows I start with single-server orchestration. When I need reliability, I use event sourcing or a workflow engine. Temporal is my default - it gives durable execution, automatic retries, and compensation without building it myself."
Single-server Orchestration
The simplest approach: one service calls each step in sequence. Fine for low-stakes flows. Fails when the server crashes mid-flow - no memory of progress. No way to handle "wait for user to sign document in 5 days" without blocking.
Event Sourcing
Store events in a durable log (Kafka). Workers consume events, perform work, emit new events. Payment worker sees "OrderPlaced", charges payment, emits "PaymentCharged". Inventory worker sees "PaymentCharged", reserves stock, emits "InventoryReserved".
Fault tolerant: if a worker crashes, another picks up the event
Scalable: add more workers
Audit trail: full history of events
Workflow Engines
Temporal (and similar) provide durable execution. You write workflow code that looks like normal code, but the engine checkpoints after each step. On crash, another worker resumes from the last checkpoint. Activities can retry with backoff. Handles long waits (days) without consuming resources.
AWS Step Functions - Declarative state machines. Good for AWS-heavy environments. Less expressive than code-based engines.
When to use: Payment flows, human-in-the-loop (Uber driver acceptance), any "if step X fails, undo step Y" scenario.
When to Use in Interviews
Use workflows when you hear: "if step X fails we need to undo Y", "ensure all steps complete or none do", "user might take days to respond". Payment systems, Uber-style matching, document signing.
When NOT to use: Simple async (resize image, send email) - use a queue. Synchronous request-response. High-frequency, low-value operations.
Summary
Simple flows - Single-server orchestration
Reliable flows - Event sourcing or workflow engines
Temporal - Durable execution, retries, compensation
Listen for - State machines, partial failures, long waits
{{SUBSCRIBE}}
{{BUTTON:Read More Articles|https://systemdesignlaws.xyz}}


