Resilient AI Workflows in Rails 8.1 with Active Job Continuations
Learn how to build resilient, multi-step AI workflows in Rails using
ActiveJob::Continuable. This guide shows you how to create fault-tolerant jobs that automatically resume after interruptions, saving time and expensive AI processing calls.AI features rarely fit into a single short operation. More often, you’re running a chain of steps: fetch data, call one or more providers, normalize results, write to the database, notify the user. And that’s where the usual pain starts: the job is long-running, workers get restarted due to deployments or infrastructure events, and the job can die halfway through.
If that job simply restarts from scratch, you lose time, money (especially on LLM calls), and predictability. What you really want is:
“Continue from where we left off.”
Rails 8.1 adds a native mechanism for exactly that: Active Job Continuations, implemented via
ActiveJob::Continuable.What are Active Job Continuations (in plain terms)?
ActiveJob::Continuable lets you break a single job into discrete steps and record progress after each step completes successfully. If the job is interrupted - due to a deployment, worker restart, or process crash - Rails will resume it on the next run from the first incomplete step.In other words:
- It does not repeat steps that already finished.
- It continues from the first incomplete step.
Think of it as checkpoints for a long pipeline - built directly into Active Job.
Why this matters for AI pipelines
AI orchestration often includes steps that are:
- Expensive (LLM calls, search, large-file processing)
- Long-running (imports, embeddings, page-by-page processing)
- Operationally fragile (deployments that restart workers)
Rails docs explicitly calls out deploy scenarios - especially deploys with Kamal, which by default only gives job-running containers ~30 seconds to shut down - as a motivating case for continuations: without checkpoints, long jobs are more likely to get cut off mid-flight.
A practical example: generating an AI report
Imagine a task where you take a source document and produce a polished report. A typical pipeline looks like this:
- Extract text (PDF/HTML/DOCX - doesn’t matter)
- Ask a model for summary points
- Assemble the final format (Markdown/HTML/PDF)
- Mark as complete and deliver it to the user
Without continuations, a deployment during formatting can force the job to restart and redo the most expensive step (LLM summarization). With continuations, the job resumes and continues forward.
Implementation
class Ai::GenerateReportJob < ApplicationJob include ActiveJob::Continuable queue_as :slow retry_on Report::GenerationError, wait: 5.seconds, attempts: 3 def perform(report_id) # This code runs on every resume, so it must be idempotent. report = Report.find(report_id) report.update!(status: :in_progress) step :extract_text do report.extract_text! end step :generate_summary do report.generate_summary! end step :format_report do report.format_report! end step :deliver_report do report.update!(status: :completed) report.deliver! end rescue StandardError # Final error handling. if report report.update!(status: :failed) report.broadcast_completed end raise end end
What’s worth noticing here
1) The steps read like a script
This is the big win: the orchestration is explicit, readable, and easy to maintain.
2) Idempotency is not “nice to have”- it’s required
The most important rule: the code inside each step must be safe to run more than once.
In practice that means:
-
extract_text! / generate_summary! / format_report!should be able to recognize: “I already did this; don’t break anything if I run again.” - Persist step outputs (e.g.,
extracted_text,summary_json,formatted_body) and guard: “if present, return early.” - Commit state at the end of a step so you don’t leave half-written artifacts that make retries unsafe.
3) Code outside steps runs every time
Rails calls this out directly: anything before the first
step (and between steps) runs on every resume. So keep expensive one-time work inside steps.4) Cheap resume instead of expensive restart
If the job fails after
:generate_summary, the next run can skip extraction and summarization - so you avoid repeating expensive work.5) Steps can be blocks or method references
Rails supports multiple styles for defining steps (blocks and method references).
When you need a cursor: when steps are not enough precision
Sometimes splitting into steps still isn’t granular enough.
Example: you have a single step that processes 10,000 records/pages/chunks. If the job is interrupted in the middle of that step, resuming from the start of the step may still be too costly or too slow.
This is exactly when you use a cursor: when even step-level checkpointing is not precise enough and you need progress tracking inside a step (e.g., “resume from record 7,421, not from the beginning of the step”). Rails supports this officially by yielding a step object with a
cursor, and letting you advance it as you go.Cursor example (conceptual)
Rails documentation describes the pattern as:
-
step.cursortells you where to start, - you process a batch/record,
- you update the cursor (e.g.,
step.advance!(...)), - if the job is interrupted, Rails resumes the step and continues from the cursor.
Practical recommendations
-
Make step bodies idempotent.
Each step should either “do the work and persist it” or “notice it’s already done and exit.” -
Keep steps neither tiny nor massive.
Too small: noisy, harder to follow. Too large: less control and more expensive recovery. -
Use a cursor when step-level granularity isn’t enough.
If a step loops through thousands of items, cursor-based progress prevents “restart the whole step” and gives you higher precision. -
Log step boundaries and cursor progress.
Step boundaries are natural observability points; cursors make it obvious where the job stopped.
Conclusion
ActiveJob::Continuable is a native way to make long-running AI jobs resilient: instead of being a single fragile attempt, a job becomes a checkpointed workflow that survives deployments and restarts.If your pipeline includes even one expensive part (LLM calls, search requests, large-file processing), continuations typically pay off quickly - in cost savings, operational stability, and predictability.
Happy Coding!