By Jim Grey (about)
I often quip that when you plan and manage a sprint, as a leader you have to channel your inner Donald Rumsfeld.
You remember ol’ Don, don’t you? He was twice the Secretary of Defense in the United States, first in the Gerald Ford administration and later in the George W. Bush administration.
Rumsfeld is famous for having said, “There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns; the ones we don’t know we don’t know.” Here, listen:
He said this in answer to a reporter’s question about a lack of evidence linking Iraq’s government with supplying weapons of mass destruction to terrorists. It sounded so absurd at the time that the media had a field day ridiculing him.
But over the years, we’ve come to realize how much sense his statement makes. In anything you set out to do, you know what you know, you might know or be able to guess some things that you don’t know, and you certainly don’t know many things that you don’t know.
In agile software delivery — heck, in life:
- You plan for the known knowns.
- You make contingency plans for the known unknowns.
- When unknown unknowns happen, all you can do is manage through them.
On a team I once managed, we wanted to build a feature to email end users the day after they abandoned a transaction on our site. We hoped to reengage them to finish their transaction.
We wrote and groomed the stories. Most of them had a clear implementation — plenty of known knowns. They were easy to estimate.
A few stories had known unknowns. One was, “It’s been a while since we’ve been in that repo, and it was written in our wild-west days. I might find some cruft in there that I’d need to straighten out to be able to finish this story.”
Another known unknown was, “I’d have to do a deep dive into that module to be sure, but if does this certain thing in this certain way, I’d have to do considerable extra work to stitch in the code I need to write there.”
In each case we discussed whether that work, were it necessary, would add meaningful complexity to the story and inflate its estimate. Sometimes it was yes, sometimes it was no. When it was yes, sometimes we wrote a dive story to find out and other times we thought the impact would be small enough that we could just accept the risk.
As we planned sprints with those stories, we believed we had accounted for everything we could think of, and were confident in our sprint plans. But two things went wrong we didn’t foresee — unknown unknowns bit us hard.
One story involved writing a job that would check overnight for abandoned transactions. The engineer who picked up that story encountered five or six serious unexpected difficulties with writing the job and with getting the job runner to pick it up properly. A story that we thought he would wrap up in two days spilled well into the next sprint.
Then it turned out that our email service was never built to handle the volume we threw at it, and it failed immediately in Production. One engineer devoted all of his time, and another part of his time, to solving the problem. A lot of our tester’s time went to troubleshooting with the engineers. The situation was surprisingly thorny. The engineers implemented several successive fixes over several days before finally getting past the problem.
Thanks to these unknown unknowns, we missed a couple sprint plans by a mile. Here’s how we managed through them: I worked with the team to prioritize stories and figure out which ones we were likely not to deliver. The engineers watched for stories ready to be tested, and helped our tester by picking some of them up themselves to keep the revised plan on track. Finally, I reset expectations with my management about what we could actually deliver, and how that would affect delivery of other work that depended on it.
You could argue that these should not have been unknown unknowns. Writing a job should have been cut and dried. We should have thought about volume in our planning, and tested for it.
Hindsight is always 20/20. Retrospectives are where you explore that hindsight and make improvements for the future. After you experience unknown unknowns, improvements generally take two forms:
They become known knowns, or maybe known unknowns. We learned a lot about the complexities of jobs in our world. We knew we needed to write a few more in upcoming sprints, so we broke them down into several smaller stories that we could test incrementally. Also, we now expected that there might be challenges in writing jobs we hadn’t encountered yet, and to lay in contingency plans for them.
You can lay in plans to neutralize them as unknowns in the future. We found out that other teams had been having trouble with the job runner. We met with the team that owned that code to figure out how to make the job runner more reliable. They put tech-debt stories in their backlog to handle it. We also researched how we might make the email service more robust and investigated third-party email services. We ended up going with a third-party service.