Build an Autonomous SRE Agent with Stably

Neil ParkerEx-Uber Eng Lead, CTO of Stably

5 min read

Jan 26, 2026

What if your on-call engineer was an AI agent? Wire up observability alerts to a coding agent, let Stably validate every fix, and close the loop—automatically.

At 3am, your pager goes off. Latency spike. Error rate climbing. You drag yourself out of bed, SSH into production, stare at logs, form a hypothesis, push a fix, pray it works.

What if that entire workflow could run autonomously—while you sleep?

The Dream: Closed-Loop Incident Response

The pieces are already here. You have observability platforms firing alerts. You have AI coding agents that can read code, understand context, and write fixes. What's missing is validation—a way for the agent to know if its fix actually works before declaring victory.

That's where Stably comes in.

Here's the architecture: take any alert from your observability stack, feed it to a coding agent (Claude Code, Cursor, Devin, or your own), and use Stably as the confidence layer. The agent doesn't just push a fix and hope. It spins up a test environment, runs regression tests, creates new tests for the specific failure, fixes any broken tests, and loops until it's as confident as possible that the issue is resolved.

The Loop: Debug → Test → Fix → Repeat

1. Ingest the Alert

Your observability platform (Datadog, PagerDuty, Sentry, whatever) fires a webhook. The payload contains everything the agent needs: error message, stack trace, affected endpoint, timestamp, severity.

// Example: Parse incoming alert
const alert = {
  error: "TypeError: Cannot read property 'user' of undefined",
  endpoint: "/api/checkout",
  stackTrace: "...",
  severity: "critical"
};

2. Let the Agent Investigate

Your coding agent (we'll use "stably" as a generic term for any AI coding agent) reads the alert, pulls the relevant code, and forms a hypothesis. It might grep through logs, read recent commits, or trace the call stack. The key insight: the agent doesn't need to be right the first time—it just needs to iterate.

3. Validate with Stably

Here's where it gets interesting. The agent runs npx stably test to execute your existing test suite against the proposed fix:

npx stably test

This does two things:

Runs all Playwright tests against your application
Reports results to Stably for AI-powered analysis

If tests pass, great. If tests fail, the agent now has concrete feedback—not just a vague alert, but specific assertions that broke.

4. Auto-Fix Broken Tests

Sometimes the fix is correct, but the tests are outdated. Stably handles this with npx stably fix:

npx stably fix

This command:

Analyzes failing tests using AI
Proposes fixes for broken locators, outdated assertions, or flaky selectors
Works automatically in CI without needing manual run IDs

The agent can accept these fixes, re-run tests, and continue iterating.

5. Generate New Tests for the Bug

If the original bug wasn't covered by existing tests (and it often isn't—that's why it shipped), the agent creates a new test. Using Stably's SDK, this is just Playwright with AI superpowers:

import { test, expect } from '@stablyai/playwright-test';

test('checkout handles missing user gracefully', async ({ page }) => {
  // Navigate to checkout without auth
  await page.goto('/api/checkout');

  // AI assertion: describe what "correct" looks like
  await expect(page).toMatchScreenshotPrompt(
    'Should display login prompt, not crash'
  );
});

The test gets added to the suite. Future regressions are caught automatically.

6. Loop Until Confident

The agent keeps iterating:

Push fix to preview environment
Run npx stably test
If failures, run npx stably fix or refine the code fix
Repeat until all tests pass

Only when the full test suite passes—including the new regression test—does the agent consider the incident resolved.

Why This Works

Deterministic Validation

The agent isn't guessing. Every iteration produces concrete pass/fail signals. stably test gives you the same results every time, so the agent can reason about cause and effect.

AI-Native Feedback Loop

Stably's auto-fix isn't just string replacement. It uses AI to understand why a test failed and propose semantically correct fixes. This means the agent can trust the suggestions and iterate faster.

Full Playwright Compatibility

There's no vendor lock-in. Every test is a standard Playwright test. You can run npx playwright test directly if you want. The agent works with your existing test infrastructure, not against it.

Parallel Execution at Scale

When the agent needs to validate a fix, it can spin up 100+ parallel tests on Stably Cloud. What would take 20 minutes locally runs in under a minute. Faster feedback means faster resolution.

The Practical Setup

Here's how to wire this up today:

1. Configure Stably in your repo:

npx stably init

2. Add the Stably Reporter to playwright.config.ts:

reporter: [['@stablyai/playwright/reporter']],

3. Set environment variables:

export STABLY_API_KEY=your_api_key
export STABLY_PROJECT_ID=your_project_id

4. Give your agent the commands:

npx stably test — Run tests, report results
npx stably fix — AI-powered test repair

That's it. Your agent now has a validation layer. It can propose fixes, test them, and iterate—all without human intervention.

The Future of On-Call

We're not replacing SREs. We're giving them superpowers. The agent handles the 3am grunt work: investigating, hypothesizing, testing, iterating. By morning, you wake up to a resolved incident and a pull request ready for review.

The pager still goes off. But now something else answers.

Get Started

Ready to build your own SRE agent? Start with the Stably CLI Quickstart and wire up your first automated validation loop. The infrastructure is ready—you just need to connect the pieces.