Wishing for a Better Outcome Isn't a Mechanism. The COE Approach to Continual Improvement and Avoiding Blame.Focus on what you can control, and move past your narrow focus on outcomes. When mistakes happen, the right question is: Why did your system allow it? How could you have influenced that result?
Welcome to the Scarlet Ink newsletter. I'm Dave Anderson, an ex-Amazon Tech Director and GM. Each week I write a newsletter article on tech industry careers, and specific leadership advice. Free members can read some amount of each article, while paid members can read the full article. For some, part of the article is plenty! But if you'd like to read more, I'd love you to consider becoming a paid member! When you run software, one of the major things you need to deal with are operational events. That means some service went down, or your website started loading super slowly for no clear reason, or no one can download their data. Through the year, these things happen seemingly at random. At 2am on a Thursday, your mobile app will suddenly not allow anyone to log in. Some engineer needs to go figure out what went wrong, and get that fixed. This type of outage feels random to the team it hits. However, it’s not actually “random” because computers don’t tend to break randomly. Every holiday (around Thanksgiving), Amazon organizations would put in strict limits on teams pushing new code. The idea was to keep things stable while customers were the most active spending their money. What really struck me was how clearly our systems improved when we stopped changing things. Not just our team’s systems, but systems that depended on us. And the systems that depended on the systems that depended on the systems (you get the point). The entire interconnected network of Amazon technical products was better because no one messed with what worked. What does that mean to us in this article? It means that the vast majority of major outages weren’t due to heavy load, or the inability for Amazon systems to scale. It was manual human error. It was a human doing something wrong. Particularly with Amazon’s reputation as a bruising place to work, you’d think that these poor humans were getting emotionally beat up. Perhaps their careers were in danger due to those mistakes they were making. While this surely happened in isolation, Amazon is a place which strongly believes in mechanisms. In other words, you can fix software. You can fix a process. You can’t fix humans into being perfect. Here’s an anecdote which really stuck with me. After a fairly major incident, our SVP got some leaders in our organization together to discuss the event.
That discussion demonstrates an incredibly important key understanding for work, and in life. Humans make errors, and we will always make errors. We can try harder, and really concentrate, and we’re still going to make errors. Yet, when yet another error shows up, our natural reaction is to figure out how to avoid making errors. Instead, we should be working to improve our outcomes regardless of human imperfections... Continue reading this post for free in the Substack app |