Your team ships fast.
Your end-to-end tests break faster.
On large suites, 60–80% of failures come from outdated tests and flakes, not real bugs. Engineers burn hours each week nursing tests back to green. QA leads spend more time triaging noise than preventing regressions.
Most “AI test healing” tools only patch selectors. They can’t reliably tell test drift vs real bugs, don’t see your repo or observability stack, and usually don’t run inside your VPC.
You add end-to-end tests to increase confidence.
Fast forward a few quarters, and they’re mostly a source of noise.
What was supposed to protect velocity quietly becomes a tax on velocity.
That’s exactly the layer you want an agent to take over—if it’s safe and reliable.
"Auto-heal" is easy to say and hard to do. Most tools fail because they can't:
Stably solves all four by running as a CLI inside your CI pipeline, with access to the same code, artifacts, and tools your engineers already use.
At Stably, we built real auto-heal: an agent that runs in your CI, uses full context from your code and infra, and actually updates test code—not just selectors.
At a high level, Stably does three jobs for your Playwright suite:
You wire it into CI alongside your existing npx playwright test runs and invoke it on a run ID. Everything happens inside your infra.
You don't replace anything. You layer Stably on top:
npx playwright test as usual.npx stably heal <run-id>) as a post-step.From there, it talks to:
all under your current IAM, networking, and audit policies.
Outcome: you get a typed list of issues—flakes, drift, and real bugs—instead of a single red blob.
Stably fingerprints each failure using:
Then it tags each failure as:
Example:
A checkout test starts failing overnight.
Stably sees that:
It tags this as test drift, not a flaky test and not a backend outage.
You see that classification immediately, without having to reverse-engineer the failure yourself.
Outcome: you can let it touch your test code and still sleep at night.
Stably's auto-heal uses the same engine as our Test Creation agent—the system that already learns your app by exploring real flows. Over time, it sees more of your product than any single engineer or QA on the team.
That lets it confidently update tests, even for complex UIs:
You get small, reviewable diffs that feel like a senior engineer kept your suite up to date in the background.
Outcome: real bugs don't get "auto-healed away." They get promoted.
When Stably identifies a real bug, it:
You can plug that into Jira, Slack, PagerDuty—whatever you already use.
Example:
A login test starts failing with a consistent 500 after a specific deploy. Stably spots the server error in traces, links it to a new auth-service commit, and files it as a real regression, instead of trying to "fix" the test.
The autonomous part is simple: the suite knows when to heal itself—and when to call a human.
Teams using Stably auto-heal have cut end-to-end test maintenance by 50–75% while keeping suites 9%+ green—often enough to defer QA/SDET hires and turn CI back into a real quality gate.
From early deployments on Playwright-based E2E suites:
Fintech (~2,000 tests)
Enterprise SaaS (multi-team monorepo)
Teams describe the shift like this:
In other words: the suite starts to feel like an asset again.
Auto-heal is one part of Stably's AI testing platform.
The same foundation also powers:
So your tests don't just stay green; they're also easy to create and cheap to run, even as your product and coverage grow.
Stably layers on top of your existing Playwright setup—no rip-and-replace. Teams typically see impact within a single sprint.
Real auto-heal lets your end-to-end tests move at the same pace as your product—without turning your engineers into test janitors.