Winsen
The roster
AI OpsOn the bench

Meet Ops. The quiet engine. Runs the recurring reports, watches the dashboards, files the things that have to be filed, on schedule, without a nudge.

See Ops work
Ops
AI Ops · Operations
All green
↳ getting ready

Nightly health-check done. 3 services at 99.98% uptime, 4 auth tokens rotated, backups verified at 2.3TB. One prod DB resize flagged, outside the runbook, so it paged you.

99.98%
Uptime
4
Tokens rotated
2.3TB
Backups
TL;DROps runs the recurring work nobody owns, on schedule, so it stops slipping when someone's out.
On schedule

The nightly checks ran. Everything is green. Nobody had to remember.

Ops runs the recurring checks every night while you sleep, the ones that only get noticed when they're skipped. By the time you're up, the report is filed and the all-clear is in your channel.

Nightly health-check
02:00 · done 02:11
  • API latency p95142ms · ok
  • Queue depth0 stuck · ok
  • Cert expiry sweepnone under 30d · ok
  • Disk + error budgetwithin limits · ok
All checks green · posted to #opsfiled
Inside the runbook

Tokens rotated, backups verified. Only the steps you scoped.

Ops runs the maintenance you put in the runbook: rotate the tokens on the cadence you set, take the backup, then actually restore-test it so a green checkmark means something. The actions it can take are the actions you signed off on, and nothing else.

Token rotation · weekly
4 rotated · old keys revoked · scoped to the 4 in your runbook
done
Backup verified · 2.3TBrestore test passed

Snapshot taken 01:40, restored to staging, row counts matched.

Wants to resize the prod database
⚑ outside the runbook · waiting for a human

Disk is at 71% and climbing. This action isn't in the scoped list, so Ops won't run it. It drafted the change and is holding for your call.

Send itEdit
Before it lapses

The renewals that always sneak up are already on your radar.

Ops watches the vendor contracts and sends the reminder before the auto-renew clock runs out, with the amount, the date, and the cancel-by window attached. The thing that lapses because nobody was tracking it stops lapsing.

Datadog · renews Jun 14
cancel-by Jun 11 · 3 days
$2,400/mo
Vercel · renews Jun 15
seats reconciled · 2 unused flagged
$960/mo
2 renewals next week · reminder drafted for you⌘⏎ send
Every Friday

The weekly status, compiled. Nobody had to assemble it.

The report that gets skipped the week things are busy, the week it matters most, writes itself. Ops pulls the uptime, the deploys, the open incidents, and the cost trend into one status and posts it on time, every week.

Weekly ops status · wk 23
  • · Uptime 99.98% · 0 SEV-1, 1 SEV-3 closed
  • · 14 deploys · 0 rollbacks
  • · Backups 7/7 verified · 4 tokens rotated
  • · Cloud spend $18.4K · flat wk/wk
Posted to #leadership · Fri 09:00on time

Every line sourced. Click a number, see where it came from.

When it breaks at 3am

In an incident, Ops runs the allowed steps and pages a person for the call.

Ops doesn't guess its way through production. When an alert fires, it runs the runbook steps you've allowed, gathers the context from your past incidents, and pages a human for the decision. The judgment, and the call, stay with a person.

!
SEV-2 · checkout error rate spiking
03:14 · matches incident #218 (Mar)
Allowed runbook steps · run
  • Pulled the relevant logs + tracesdone
  • Drained the bad node from the pooldone · scoped
  • Rollback the 02:58 deploy?needs a human
Paged on-call · context attached
likely cause + the rollback decision, ready for your yes

The rollback is outside what Ops runs on its own. It's teed up with the diff and the blast radius. You make the call.

Send itEdit
The benchmark

Against the standalone AI SRE tools, honestly.

Cleric, Resolve.ai, and incident.io are strong at autonomous incident investigation, and on raw root-cause speed they are the bar. The market's own conclusion is that the safe pattern is graded autonomy: scoped permissions, approval before anything touches production, a human on the call. That is where Ops is built to live, plus it's an employee on your team, not a separate console.

Knows your runbook + company
Grounded in your docs and past incidents
Ops
Reads your runbook and past incidents from the brain.
The standard
Learn a graph from telemetry and your stack, runbook-free.
Approval-first / scoped to production
What it can touch without a human
Ops
Only scoped actions run; everything else waits.
The standard
Range from read-only to auto-remediation; it varies by tool.
Part of a team
Works alongside your other employees
Ops
One of a roster; hands off to the rest.
The standard
A standalone SRE tool or a console, not a colleague.
Owns the data
Where your operational knowledge lives
Ops
The brain is yours: portable and sourced.
The standard
Operational memory is vendor-held.
Deep autonomous root-cause
Where the dedicated tools lead
Ops
Runs the allowed steps, then pages a human.
The standard
Purpose-built for deep autonomous root-cause.
Hire vs build
How you bring it on
Ops
Hire an employee; it onboards on your runbook.
The standard
Buy and integrate a tool or platform.

Honest read: for deep autonomous root-cause on a sprawling microservice estate, a dedicated AI SRE built only for that will go deeper, and that's a fair reason to run one. Ops wins when you want the recurring ops work owned end to end, the production guardrails on by default, and one teammate who reads your runbook and your past incidents from a brain that's yours, not a vendor's.

Operations keeps
  • Incidents
  • Vendor escalations
  • Runbook decisions
Ops takes
  • Recurring reports
  • Backups, rotations, checks
  • Filing and reminders
The line Ops won't cross

Only touches the actions you've scoped. Anything outside the runbook waits for a human.

How it earns trust.

Nobody gets the keys on day one. Not even the AI.

Week 1 · Shadow

Watches and drafts. It learns your domain from the brain and drafts everything for your approval. You see exactly what it would do.

Week 2-4 · Supervised

Acts, you approve. It proposes real actions. You approve, edit, or kill, and every edit teaches it. Approval rates climb as it dials in.

Ongoing · Trusted

Routine on autopilot. You hand over the low-risk, repetitive work. The consequential calls still wait for you, by design.

Learned from
your runbookyour past incidentsyour on-call patterns
Tools
NotionSlackPagerDutyAWSRampLinear

The hand-off.

How Ops pings a human when it's your call.

Ops: nightly checks are green, backups verified, two vendor renewals come up next week. Anything you want me to hold?
FAQ

The honest answers.

No dodging, no contact-sales-to-find-out.

What happens in an incident?+
It runs the runbook steps it's allowed to and pages a human for the call. The judgment stays with a person.
Is Ops available now?+
On the bench. Waitlist teams get first access when the role ships.
Can it touch production?+
Only the actions you've explicitly scoped. Everything else waits.

Ops is on the way.

AI Employees are sold separately. Waitlist folks get first dibs when the roster opens.

Don't take our word for it

Work is better with Winsen.

Ask your favorite AI for a summary on Winsen. It opens with the question ready, so you get an honest read in one click.

Powered by winsen.ai/llms.txt