On OKRs

On OKRs

[A repost for reference, since the original was removed as part of house-cleaning elsewhere]

I'm solidly in favour of a planning architecture of some kind for any team-size collection of people greater than about 5. (Hell, arguably above 2, but let’s keep overheads down.)

I’ve had the opportunity to try OKRs across a number of teams and companies, and I’ve found them useful overall. Those of us who’ve been in large companies or planning-focused small companies will have encountered a number of approaches to planning before. OKRs take their place amongst this number, I think; neither clearly the best (but absolutely not the worst), they are great for certain kinds of things but not for others.

Let me be more precise.

There are a couple of things I like about them:

  • OKRs are easy to get started with and don’t require huge expertise. The kind of gatekeeping behaviour that surrounds them (there is some, of course) is not as impenetrable as some other industry practices, IMHO. It’s easy to sling some stuff together and get started. No necessity to get yourself consulted-up to get going. The focus is supposed to be on your team’s plans, not the planning infrastructure.
  • It’s easy for them to remind you about what’s important. They are intended to be a high-level steering mechanism, and as such, they surface concerns about direction, mutual co-operation, etc etc, which generally don't happen in the context of an individual team. It’s a hugely useful spur to thinking like an owner. I found it naturally moved the direction of my gaze up from my shoelaces (the minutiae) to something a bit higher-level (the business), which helped in turn to focus effort on thinking about what goals should be and how they are phrased to make them SMART (measurable, etc etc). Insular, siloed team thinking is the default, and this is one of the more effective ways to try to short-circuit that pattern.
  • Tooling isn’t hugely important, but good effort gets good results. The internal Google tooling around OKRs made it trivially easy to look at other team's stuff and figure out if there was overlap/clash/areas of mutual support/etc, which was great for cross-company visibility. However, you didn’t need it, and a bunch of spreadsheets or items in Azure DevOps is also a perfectly tractable approach. Whatever works for you.

Expecting to fail is hugely valuable. Common practice at the time inside Google (there’s some disturbing assertions this is no longer the case) was that you should expect to score 0.7 out of 1.0 on your OKRs "on average". Whether or not the team ended up hitting that, setting the expectation helped to do a number of valuable things:

  • Frame ambition. When I first joined Amazon, the message I got was "Expect to fail, because if you're doing your job right, you're doing new and/or difficult things no-one else is doing, and you'll probably fail plenty of times. That's ok." It's actually very hard to hear that message properly, particular if you are an insecure junior engineer. Every emotional instinct was to shield myself and stay safe, and I don’t imagine I was the only one. But the explicit permission and expectation of failure did in fact (eventually!) provide sufficient cover to start being ambitious about things.

    The 70% message that Google pushed was a pretty good analogue of the Amazonian one, though less explicit: it was an acknowledgement that some error was allowed, which created permission for trying stuff out.
  • Forced success is early death. The converse culture, where everything must be a total success continually, is cultural death. In these kinds of cultural environments, incentives switch to hiding things and lying, since it becomes more problematic to admit failure. Not only that, but you’re the only one who experiments and everyone else stays safe, well gosh, you sure look out of whack and all your metrics will be too! Planning and related presentations turns into a game of picking off the outliers; those who distinguished themselves by trying hard and failing.

    It’s hard to construct a genuine set of ambitions in that kind of environment, so innovation gets bred out.

There are a couple of things I didn't like about OKRs, however:

  • Ritual drains meaning. As with every serious planning architecture, there's a lot of ritual around OKR planning sessions. With a quarterly cadence and mid-quarter check-in, there aren’t actually that many weeks which are free of contact with OKRs. It can become a ritual, and ritual can drain meaning. Engineers in general preferred to opt out of ritual-like behaviour, and therefore a number of the serious discussion about prioritisation, and so on. You could encourage people but forcing people ran the risk of turning them off the whole thing permanently. However, it was definitely possible to have a whole team excited by OKRs — and that was wonderful when it happened.
  • The famous "50% project time”. OKRs were an okay-ish fit for how SRE did project work, which in general suffered from the interrupt-driven/production fire problems. Though many column inches have been spilled on the nature of SRE project work and why the 50% boundary, etc, my lived experience was that mostly, SRE project work was kinda like batch scheduling: it'd get done, eventually. Your big hope was that it would still be relevant by the time it was done. A lot of the time it was. It’s possible another framework, other than OKRs, or something on a shorter timeframe, would help more with that.
  • Policy versus implementation. There were arguments that never converged about what level of goal was most appropriate for a team, and the interplay between the usefulness of a high-level goal and specifying its implementation as concretely as possible. This ties into question about how to do OKRs correctly.

    Let me start with an example. "Keep the site up", or "Make X more reliable", versus "Enumerate top 5 source of known outages in the trailing quarter and eliminate their root cause(s)/contributing factors". In theory, the Objective is "keep the site up", and the KR is "enumerate the outages". However, lots of things could fit under such an objective, and my personal experience was that it was not a matter of fully developed consensus best practice that everyone knew what went at O-level, and what went at KR-level. There were also questions about O-grouping and what fitted under what objective, as well as prioritisation.

    Prioritisation was another issue, of course; P0s got done except if there was a force majeure situation, and P1s generally (well over half the time) got done. But most teams didn’t get to all of their P2s, and only those P3s with serious personal investment from an engineer typically saw effort — and even that usually meant a P0 or P1 was getting starved.

Overall, I thought OKRs, though responsible for, as I say, a certain amount of ritual and non-productive argument, were a pretty decent way of actually steering a pretty complicated ship. They scaled well, set good expectations, and helped remind you what was important.

But — and it is a big but — that partially relied on a set of cultural behaviours which were not 100% to do with OKRs -- the habit of ambition, the inclination to forgiveness, and cross-team accommodation for sure -- and if those behaviours are absent or change, the usefulness of OKRs can absolutely decrease.

For example, if in fact, as this tweet suggests, OKR completion rates are now part of performance ratings, well, that is a very different framing. The theory says you shouldn’t do it, because then the incentives become about hiding, safety, and so on. It comes across like attempting to increase "legibility" at the expense of effectiveness.

I think of it this way: as they say in another galaxy, far far away, the more you tighten your grip, the more [the objectives] will slip through your fingers. Ultimately, great things are done by:

  • Giving people freedom and autonomy
  • Encouraging them to reflect, and take feedback…
  • …in a safe environment.

Steer, not control.