Implicit SLOs and their dangers

This is a topic of intermediate complexity in SLOs. If you are coming to this cold, we recommend you read a few other pieces about SLOs first, then this will make a fair bit more sense to you. SLOs, as you may know, have a dual nature: they have both…

Detecting Disturbance: incidents and Benford's Law

Recently we at Stanza have been exploring operational data, and it's been really exciting to bring techniques and ideas from other domains into our domain - production systems generally, traffic, alerting, cloud costs, etc. The thing we’ve been looking at most recently is a thing called Benford’…

Graceful Degradation and SLOs

What is graceful degradation? Graceful degradation is the idea that, when you can’t serve the user precisely what they wanted, instead of serving the user an error, you serve them some in-between thing. The details of this depend a lot on what exactly it is you’re trying to…

The TwinSLO Proposal

Comments/Insights/Contributions from * Niall Murphy * Toby Burress * Štěpán Davidovič * Sal Furino (Note that when I say "we" below, I don't specifically intend to speak for these fine people, I'm just using the academic "we". -Niall) Introduction If you don’t already…

Virtual Reflections on Kubecon NA 2023

[Reposted from Medium company blog] Introduction I feel like a little bit of a fraud writing about this, since I only managed to attend KubeCon virtually. But I watched enough of it and read enough about it that it gave me some thoughts. OpenTelemetry (Otel) Those of us who only…

SRE in the Real World

(This is a repost of a document living here, but I am putting it here for backup's sake. Originally a joint effort with Murali Suriar, with input from Matt Brown, Liz Fong-Jones, and many others. The intended audience of this doc is the recently laid-off, or those who…


[A repost for reference, since the original was removed as part of house-cleaning elsewhere] I'm solidly in favour of a planning architecture of some kind for any team-size collection of people greater than about 5. (Hell, arguably above 2, but let’s keep overheads down.) I’ve had…