John Allspaw

What’s an incident?

A note about terms: Use of terms like anomaly, event, incident, and accident tend to evoke strident debates about their exact meanings. They are used inconsistently in tech and elsewhere. Frustration with their variable interpretation has led some to try to give them crisp definitions.

Despite these efforts, none of these terms have fixed meanings. The situation is made even more difficult when word choice has significant consequences. For example, some tech firms have formal processes for handling an incident that do not apply to an event or an anomaly.  

(We have witnessed extensive discussions during event response about whether that event meets the organization’s threshold criteria for an incident! Declaring an incident would bring additional resources to bear, generate auditable documentary trails, and involve substantial future work.)

The situation cannot be resolved by fiat. Instead, we need to pay attention to how these terms are used in context and especially to the consequences of the choice of term.

In this chapter, we use the term incident as a pointer to a set of activities, bounded in time, that are related to an undesirable system behavior.

What’s an incident? Read More »

The Career, Accomplishments, and Impact of Richard I. Cook: A Life in Many Acts

Multiple professional and research communities feel a profound loss at the death of Richard I. Cook. Richard died peacefully at home on August 31, 2022 in the loving care of his wife Karen and his family. Dr. Richard Cook was a polymath who excelled in multiple careers, usually simultaneously. A physician and anesthesiologist, he was

The Career, Accomplishments, and Impact of Richard I. Cook: A Life in Many Acts Read More »

What makes public posts about incidents different from analysis write-ups

We have written before that documents written about an incident can take many forms and structures, depending on the author(s), purpose, and target audience. The goal of this post is to describe what makes public-facing articles that companies publish about incidents different from internal write-ups representing an effective incident analysis, and a rationale for why

What makes public posts about incidents different from analysis write-ups Read More »

The Multiple Audiences and Purposes of Post-Incident Reviews

The conventional rationale for undertaking some form of post-incident review (regardless of what you call this process) is to “learn from failure.” Given without much more specifics and context, this is, for the most part, a banal platitude aimed at providing at least a bit of comfort that someone is doing something in the wake of these surprising

The Multiple Audiences and Purposes of Post-Incident Reviews Read More »

Scroll to Top