Building a resilient AI Client in Ruby with Stoplight and Ruby_LLM

October 23, 2025

Islam Gagiev

Backend Developer at JetRockets

Joao Gilberto Saraiva

Backend Developer at JetRockets

Calling external AI providers (like OpenAI, Gemini, or Claude) is a common task in modern web applications. However, these services can sometimes be slow, return errors, or become completely unavailable. This can lead to cascading failures that bring your entire application down. To prevent this, you can use the Circuit Breaker pattern, which isolates failing services and even allows for a graceful failover to a backup provider. In Ruby, the stoplight gem provides a straightforward way to implement this pattern and build a resilient client, in this article we will use it with ruby_llm to handle AI providers. To see this implementation in action and explore the details, be sure to check out the repository containing the code examples used in this article.

What is the Circuit Breaker Pattern?

The Circuit Breaker pattern monitors calls to a service. If failures reach a certain threshold, it "trips" (opens the circuit), preventing further calls to that service for a set "cool-off" period. This gives the failing service time to recover and prevents your application from wasting resources on requests that are likely to fail. A circuit breaker has three states:

Closed (green light): The normal state. Requests pass through, and failures are recorded.
Open (red light): After a failure threshold is met, the circuit opens. All requests are immediately rejected without attempting to call the service
Half-Open (yellow light): After the cool-off period, the circuit moves to this state. It allows a limited number of test requests to pass through. If they succeed, the circuit moves back to Closed; if they fail, it returns to Open

Basic Example

For use ruby_llm gem, first we need to set the api keys to the config file. You can install the Rails integration with rails generate ruby_llm:install too.

# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.gemini_api_key = ENV["GEMINI_API_KEY"]
   # You could add Anthropic, etc. here
end

Here's the simplest way to use stoplight after install it with bundle or manually:

# Create a light
# It will "trip" after 3 failures in a row
light = Stoplight("my-ai-service", threshold: 3) do
  # Code that might fail, e.g., an API call
  MyAIProvider.new.call
end

# Run the code
begin
  light.run
rescue Stoplight::Error::RedLight
  # The circuit is open!
  puts "Service is unavailable, falling back."
end

This code creates a simple circuit. If MyAIProvider.new.call fails 3 times, the light turns "red" (Open) and will immediately raise a Stoplight::Error::RedLight on subsequent calls. For a real application, you need to persist the state of your circuits. If you just keep the state in memory, an application restart would reset all circuits, potentially flooding a service that is still down. We configure stoplight to use Redis in production for a persistent data store.

# config/initializers/stoplight.rb
Stoplight.configure do |config|
  # We can disable notifiers if we don't want external alerts
  config.error_notifier = ->(_) { }
  config.notifiers      = []

  if Rails.env.test? || Rails.env.development?
    # Use in-memory data store for dev/test
    config.data_store = Stoplight::DataStore::Memory.new
  else
    # Use Redis for production
    config.data_store = Stoplight::DataStore::Redis.new(Redis.new, warn_on_clock_skew: false)
  end
end

By using Redis, the state of all circuit breakers (Closed, Open, or Half-Open) is maintained even if your application restarts.

Implementing a Resilient AI Client with Failover

Let's build a client that uses stoplight for circuit breaking and ruby_llm to orchestrate calls and failover. Since it wraps provider-specific errors, our list is much cleaner. First, let's define our core settings for the circuit breakers.

# These settings will be used to create our circuits
module AIProviderSettings
  # If 3 consecutive failures occur, trip the circuit
  CIRCUIT_THRESHOLD  = 3

  # Stay open for 30 minutes before moving to Half-Open
  FAILURE_COOLDOWN_S = 1800 # 30 minutes

  # Only these errors will count as "failures"
  # ruby_llm conveniently wraps provider errors (like OpenAI::Error)
  # into a standard RubyLLM::Error.
  TRACKING_ERRORS = [
    RubyLLM::Error,
    Faraday::Error,
    Timeout::Error,
    Net::ReadTimeout,
    Net::OpenTimeout
  ]
end

Next, we'll define our model priority list. We'll try the most powerful model first, then a fast one, and finally a standard one. ruby_llm can work with any of them.

# Our priority list for failover
# We'll try 'gpt-4o' first, then 'gemini-2.5-flash', then 'gpt-4o-mini'
MODEL_PRIORITY = ['gpt-4o', 'gemini-2.5-flash', 'gpt-4o-mini']

Now, let's create our failover method. This method will take a ruby_llm chat object (which holds the conversation history) and the new prompt. It will loop through our priority list, using stoplight to check each model.

def ask_with_failover(chat, prompt)
  last_error = nil

  MODEL_PRIORITY.each do |model_name|
    light = Stoplight("ai_models:#{model_name}",
      threshold:      AIProviderSettings::CIRCUIT_THRESHOLD,
      cool_off_time:  AIProviderSettings::FAILURE_COOLDOWN_S,
      tracked_errors: AIProviderSettings::TRACKING_ERRORS,
      data_store:     ($STOPLIGHT_DATA_STORE || Stoplight::DataStore::Memory.new)
    )

    begin
      return light.run do
        chat.with_model(model_name)
        chat.ask(prompt)
      end
    rescue Stoplight::Error::RedLight => e
      last_error = e
      next
    rescue *AIProviderSettings::TRACKING_ERRORS => e
      last_error = e
      next
    end
  end

  raise StandardError, "All AI models failed. Last error: #{last_error&.message}"
end

How to use it:

# Create a new chat session
# The default model here doesn't matter, as our method overrides it
chat = RubyLLM.chat

# First question: might use gpt-4o
response1 = ask_with_failover(chat, "What is the Circuit Breaker pattern?")
puts response1.content
puts "-> Used model: #{response1.model_id}"

# Second question: imagine gpt-4o is now down.
# The circuit will trip, and it will fall back to gemini-2.5-flash
# The chat history is preserved!
response2 = ask_with_failover(chat, "How does it relate to 'stoplight'?")
puts response2.content
puts "-> Used model: #{response2.model_id}"

You can expand it to use with different ruby_llm features like extract info from documents, generate images, transcribe audio, analyze videos, etc.

Advantages of this Approach

Resilience: Prevents cascading failures from a single failing model.
Better User Experience: Your application can gracefully degrade from a premium model to a standard one instead of returning an error.
Gives Services Time to Recover: By stopping requests to an open circuit, you give the failing model time to recover.
Clean Abstraction: ruby_llm handles all the different SDKs, and stoplight handles all the stateful error logic.
Preserves Context: By using chat.with_model, the failover is seamless, and the new model still gets the entire conversation history.

When to Use this Pattern

Use a Circuit Breaker when:

You are calling any external, third-party API that you don't control (especially LLMs).
You want to provide multiple levels of service (e.g., a premium model that falls back to a cheaper one).
You need to prevent a single component's failure from taking down the entire system.

Conclusion

This implementation leverages the stoplight gem for robust circuit breaking and ruby_llm for clean AI abstraction. By intelligently monitoring failures, tripping circuits, and providing a failover mechanism between different models, it transforms a fragile integration into a stable and fault-tolerant one.