For a little over a year now, we’ve been working closely with the fine people at Uptime Labs and it’s difficult to contain our enthusiasm about it.
We’ve worked together on the Incident Fest series this past summer and the upcoming Fix-mas Countdown.
We are bullish on what they’re building (as well as how they’re building it) because:
- The need they are addressing is undeniably obvious. Incidents are handled more effectively by people with expertise in handling incidents. Their platform is the first of its kind.
- The drills they’ve designed aren’t run-of-the-mill simulations. They are constructed as staged world scenarios, which provide the most realistic experiences for responders. The approach has been rigorously-tested and refined over decades as an empirically-grounded and scientifically valid method of engaging cognitive work.
Some disclosure: My colleague Beth Adele Long and I are advisors to Uptime Labs.
Effective incident response requires genuine skill.

Anyone who has worked with software understands that responding to incidents effectively requires real skill and expertise. It seems clear that the difference between someone new to incident response and someone who has “seen a lot of weird shit in their time” is substantial.
Building skill requires practice. To be sure, every incident people respond to, they gain valuable experience. However, effective practice means practicing consistently and at predictable and convenient times.
Unfortunately, incidents don’t typically occur at times that are convenient for us. Even if they did, incidents are actually quite rare, especially the ones that challenge responders.
What Uptime Labs has built is an incident response training platform that gives people a way to practice the myriad of skills necessary to develop expertise in responding to incidents. Unlike actual incidents, you don’t have to wait around for one to happen; you choose the time when you’re able to do the drill.
Uptime Labs’ drills are not just any “garden-variety” simulations or table-top exercises, they are staged world scenarios.
What is a staged world scenario, and what makes it different from other types of simulation?
Staged world scenarios come from the field of Cognitive Systems Engineering, developed in the early 1980s as a way to elicit expertise from nuclear control room operators.
In a nutshell, there are a few things that set them apart from other types of simulation designs:
- The drills are designed to evoke the same cognitive work participants do experience in real-world incidents. What this means is that those responding to the incidents in the drills experience the same difficulties and complicating factors often seen in real-world incidents. The most important quality of a drill’s design is for the participants to experience the challenges of real-world incident response. The fictional company’s technical architecture doesn’t need to resemble the real-world architecture at the participants’ companies. As long as this criterion is satisfied, then people doing the drill will accept it as genuinely valid.
The primary way drills can include the necessary cognitive fidelity for responders to “temporarily suspend disbelief” is to include real-world complexities and dilemmas in their design.
One example is what’s known as the “garden path” problem, when people can become fixated on an initial hypothesis, even in light of new information. Another challenging factor is a double-bind situation, where two or more potential directions or actions to take all have downsides.
I’ll discuss more about these elements of difficulty in a separate post, but the gist is that because their drills are constructed as staged world scenarios, they can target particular cognitive work to be developed — just like a specific group of muscles.
- Drills do NOT have “solutions.”
They are not puzzles that can be won or lost. Unlike simulations that mirror “Choose-Your-Own-Adventure” books with a finite number of predetermined paths that can be taken, drills are open-ended. Real-world incidents involve ill-structured problems, where there is no single ‘correct’ solution but rather multiple viable approaches.
- It’s challenging to design scenarios and stage situations that are recognized as genuinely valid by experts.
The novel innovation here is just as much about the Uptime Labs’ team’s skill in designing scenarios as it is about the tech they’ve built.
This method (staged world simulations) originated as a way to elicit expertise from nuclear control room operators. Staged world simulations have been used in healthcare, aviation, transportation, spaceflight operations, and emergency management. Uptime Labs is the first to do it in software!
At Adaptive Capacity Labs, we are very enthusiastic about all of the above because Uptime Labs is a rare case of product development built on solid scientific foundations, rather than on tenuous (and sometimes haphazard) grounds.
We believe the future will be better when more of the former happens and less of the latter.
