Resilient AI Workflows in Rails 8.1 with Active Job Continuations

December 22, 2025

Islam Gagiev

Backend Developer at JetRockets

Learn how to build resilient, multi-step AI workflows in Rails using ActiveJob::Continuable. This guide shows you how to create fault-tolerant jobs that automatically resume after interruptions, saving time and expensive AI processing calls.

AI features rarely fit into a single short operation. More often, you’re running a chain of steps: fetch data, call one or more providers, normalize results, write to the database, notify the user. And that’s where the usual pain starts: the job is long-running, workers get restarted due to deployments or infrastructure events, and the job can die halfway through.

If that job simply restarts from scratch, you lose time, money (especially on LLM calls), and predictability. What you really want is:

“Continue from where we left off.”

Rails 8.1 adds a native mechanism for exactly that: Active Job Continuations, implemented via ActiveJob::Continuable.

What are Active Job Continuations (in plain terms)?

ActiveJob::Continuable lets you break a single job into discrete steps and record progress after each step completes successfully. If the job is interrupted - due to a deployment, worker restart, or process crash - Rails will resume it on the next run from the first incomplete step.

In other words:

It does not repeat steps that already finished.
It continues from the first incomplete step.

Think of it as checkpoints for a long pipeline - built directly into Active Job.

Why this matters for AI pipelines

AI orchestration often includes steps that are:

Expensive (LLM calls, search, large-file processing)
Long-running (imports, embeddings, page-by-page processing)
Operationally fragile (deployments that restart workers)

Rails docs explicitly calls out deploy scenarios - especially deploys with Kamal, which by default only gives job-running containers ~30 seconds to shut down - as a motivating case for continuations: without checkpoints, long jobs are more likely to get cut off mid-flight.

A practical example: generating an AI report

Imagine a task where you take a source document and produce a polished report. A typical pipeline looks like this:

Extract text (PDF/HTML/DOCX - doesn’t matter)
Ask a model for summary points
Assemble the final format (Markdown/HTML/PDF)
Mark as complete and deliver it to the user

Without continuations, a deployment during formatting can force the job to restart and redo the most expensive step (LLM summarization). With continuations, the job resumes and continues forward.

Implementation

class Ai::GenerateReportJob < ApplicationJob
  include ActiveJob::Continuable

  queue_as :slow

  retry_on Report::GenerationError, wait: 5.seconds, attempts: 3

  def perform(report_id)
    # This code runs on every resume, so it must be idempotent.
    report = Report.find(report_id)
    report.update!(status: :in_progress)

    step :extract_text do
      report.extract_text!
    end

    step :generate_summary do
      report.generate_summary!
    end

    step :format_report do
      report.format_report!
    end

    step :deliver_report do
      report.update!(status: :completed)
      report.deliver!
    end
  rescue StandardError
    # Final error handling.
    if report
      report.update!(status: :failed)
      report.broadcast_completed
    end
    raise
  end
end

What’s worth noticing here

1) The steps read like a script

This is the big win: the orchestration is explicit, readable, and easy to maintain.

2) Idempotency is not “nice to have”- it’s required

The most important rule: the code inside each step must be safe to run more than once.

In practice that means:

extract_text! / generate_summary! / format_report! should be able to recognize: “I already did this; don’t break anything if I run again.”
Persist step outputs (e.g., extracted_text, summary_json, formatted_body) and guard: “if present, return early.”
Commit state at the end of a step so you don’t leave half-written artifacts that make retries unsafe.

3) Code outside steps runs every time

Rails calls this out directly: anything before the first step (and between steps) runs on every resume. So keep expensive one-time work inside steps.

4) Cheap resume instead of expensive restart

If the job fails after :generate_summary, the next run can skip extraction and summarization - so you avoid repeating expensive work.

5) Steps can be blocks or method references

Rails supports multiple styles for defining steps (blocks and method references).

When you need a cursor: when steps are not enough precision

Sometimes splitting into steps still isn’t granular enough.

Example: you have a single step that processes 10,000 records/pages/chunks. If the job is interrupted in the middle of that step, resuming from the start of the step may still be too costly or too slow.

This is exactly when you use a cursor: when even step-level checkpointing is not precise enough and you need progress tracking inside a step (e.g., “resume from record 7,421, not from the beginning of the step”). Rails supports this officially by yielding a step object with a cursor, and letting you advance it as you go.

Cursor example (conceptual)

Rails documentation describes the pattern as:

step.cursor tells you where to start,
you process a batch/record,
you update the cursor (e.g., step.advance!(...)),
if the job is interrupted, Rails resumes the step and continues from the cursor.

Practical recommendations

Make step bodies idempotent.
Each step should either “do the work and persist it” or “notice it’s already done and exit.”
Keep steps neither tiny nor massive.
Too small: noisy, harder to follow. Too large: less control and more expensive recovery.
Use a cursor when step-level granularity isn’t enough.
If a step loops through thousands of items, cursor-based progress prevents “restart the whole step” and gives you higher precision.
Log step boundaries and cursor progress.
Step boundaries are natural observability points; cursors make it obvious where the job stopped.

Conclusion

ActiveJob::Continuable is a native way to make long-running AI jobs resilient: instead of being a single fragile attempt, a job becomes a checkpointed workflow that survives deployments and restarts.

If your pipeline includes even one expensive part (LLM calls, search requests, large-file processing), continuations typically pay off quickly - in cost savings, operational stability, and predictability.

Happy Coding!

Categories:

Recent Projects

We take pride in creating applications that drive growth and evolution, from niche startups to international companies.

Safari Portal
itinerary builder

SchoolsOut
activity finder app for parents

The Board of Life
AI-Enabled Coaching & Personal Growth Platform

Zumi
fintech mobile app

Explore Portfolio

“ implemented our ideas with efficiency
and accuracy ”