Meet Ops. The quiet engine. Runs the recurring reports, watches the dashboards, files the things that have to be filed, on schedule, without a nudge.
Nightly health-check done. 3 services at 99.98% uptime, 4 auth tokens rotated, backups verified at 2.3TB. One prod DB resize flagged, outside the runbook, so it paged you.
The nightly checks ran. Everything is green. Nobody had to remember.
Ops runs the recurring checks every night while you sleep, the ones that only get noticed when they're skipped. By the time you're up, the report is filed and the all-clear is in your channel.
- API latency p95142ms · ok
- Queue depth0 stuck · ok
- Cert expiry sweepnone under 30d · ok
- Disk + error budgetwithin limits · ok
Tokens rotated, backups verified. Only the steps you scoped.
Ops runs the maintenance you put in the runbook: rotate the tokens on the cadence you set, take the backup, then actually restore-test it so a green checkmark means something. The actions it can take are the actions you signed off on, and nothing else.
Snapshot taken 01:40, restored to staging, row counts matched.
Disk is at 71% and climbing. This action isn't in the scoped list, so Ops won't run it. It drafted the change and is holding for your call.
The renewals that always sneak up are already on your radar.
Ops watches the vendor contracts and sends the reminder before the auto-renew clock runs out, with the amount, the date, and the cancel-by window attached. The thing that lapses because nobody was tracking it stops lapsing.
The weekly status, compiled. Nobody had to assemble it.
The report that gets skipped the week things are busy, the week it matters most, writes itself. Ops pulls the uptime, the deploys, the open incidents, and the cost trend into one status and posts it on time, every week.
- · Uptime 99.98% · 0 SEV-1, 1 SEV-3 closed
- · 14 deploys · 0 rollbacks
- · Backups 7/7 verified · 4 tokens rotated
- · Cloud spend $18.4K · flat wk/wk
Every line sourced. Click a number, see where it came from.
In an incident, Ops runs the allowed steps and pages a person for the call.
Ops doesn't guess its way through production. When an alert fires, it runs the runbook steps you've allowed, gathers the context from your past incidents, and pages a human for the decision. The judgment, and the call, stay with a person.
- Pulled the relevant logs + tracesdone
- Drained the bad node from the pooldone · scoped
- Rollback the 02:58 deploy?needs a human
The rollback is outside what Ops runs on its own. It's teed up with the diff and the blast radius. You make the call.
Against the standalone AI SRE tools, honestly.
Cleric, Resolve.ai, and incident.io are strong at autonomous incident investigation, and on raw root-cause speed they are the bar. The market's own conclusion is that the safe pattern is graded autonomy: scoped permissions, approval before anything touches production, a human on the call. That is where Ops is built to live, plus it's an employee on your team, not a separate console.
Honest read: for deep autonomous root-cause on a sprawling microservice estate, a dedicated AI SRE built only for that will go deeper, and that's a fair reason to run one. Ops wins when you want the recurring ops work owned end to end, the production guardrails on by default, and one teammate who reads your runbook and your past incidents from a brain that's yours, not a vendor's.
- —Incidents
- —Vendor escalations
- —Runbook decisions
- →Recurring reports
- →Backups, rotations, checks
- →Filing and reminders
Only touches the actions you've scoped. Anything outside the runbook waits for a human.
How it earns trust.
Nobody gets the keys on day one. Not even the AI.
Watches and drafts. It learns your domain from the brain and drafts everything for your approval. You see exactly what it would do.
Acts, you approve. It proposes real actions. You approve, edit, or kill, and every edit teaches it. Approval rates climb as it dials in.
Routine on autopilot. You hand over the low-risk, repetitive work. The consequential calls still wait for you, by design.
The hand-off.
How Ops pings a human when it's your call.
The honest answers.
No dodging, no contact-sales-to-find-out.
What happens in an incident?+
Is Ops available now?+
Can it touch production?+
Ops is on the way.
AI Employees are sold separately. Waitlist folks get first dibs when the roster opens.



