Slide 1
Incidents are unplanned investments

We help you get the greatest return on them

Slide 1
You're not learning from incidents,

you're making simplistic fixes.
We help you understand the difference.

Adaptive Capacity Labs
You’re having incidents because you’re successful.
The good news: incidents are inevitable by-products of the complexity that comes with growing a successful business.
The bad news: these surprises are disruptive, painful and costly. You’re losing money and time dealing with them and the organization isn’t learning very much from them. The response to incidents is mostly localized and short-term “fixes”, and you are concerned that the organization doesn’t seem to be learning much from the incident experience. 

We bring research-driven methods and approaches to drive effective incident analysis in software-reliant organizations. 

We can help you:

  • understand what your incidents are trying to tell you, and
  • learn from these painful experiences.

Incidents are unplanned investments, and they are also opportunities. Your challenge is to maximize the ROI on the sunk cost. To do that, the organization has to invest in really exploring and understanding these events, and share that understanding broadly and over time.

What We Do

LFI Assessment

We use research methods to assess: how well your organization learns from incidents, what your teams actually learn, and how that insight influences budgets, training, hiring, roadmaps, etc.

We deliver a full report of these findings that identifies opportunities and a clear set of recommendations.

Incident Analysis Training

We can give you the skills to get deeper insight into your incidents, and in less time. 

This project bootstraps the development of effective incident analysis expertise in your company, taught and coached by pioneers of event reconstruction and software accident investigation.

Aftermath Projects

We perform independent Incident Analysis for events on short notice for organizations that have experienced high-profile events and are under intense pressure from stakeholders to produce a thorough analysis of the event in a paradoxically short period of time.

Who We Are

We bring research-driven methods and approaches to drive effective incident analysis in software-reliant organizations.

Our work goes beyond typical template-driven “postmortem” analyses.

We have over four decades of experience with incident analysis and organizational learning from events in complex systems. We’ve worked with organizations in tech, medicine, aerospace, finance, and manufacturing. We study decision making, problem detection and identification, and diagnosis and response coordination — all under “normal” conditions of increasing pressure, complexity, ambiguity, uncertainty, and high consequences of failure.

The most valuable part of our work is when our clients learn how to direct this deep-level analysis themselves and build an internal community of incident analysts.

We’ve worked on incidents you know and ones you’ll never hear about. Just to be clear: people don’t call us up to discuss the weather and how well things are going. They call us because of trouble, sometimes scary trouble. 

We are experts in trouble.

Our Team

John Allspaw
John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.”  His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement.
 
John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University.
Beth Adele Long
Beth Adele Long is a writer and software engineer with over twenty years of experience building, maintaining, and repairing web systems (mostly repairing). While at New Relic, she led the collaboration with the SNAFUcatchers consortium.

With Dr. Richard Cook, Beth co-authored “Building and revising adaptive capacity sharing for technical incident response: A case of resilience engineering,” the first academic paper on Resilience Engineering in the software domain.
Dr. David Alderson

David L. Alderson is currently Visiting Scientist, while on sabbatical from the Naval Postgraduate School where he is Professor in the Operations Research Department and serves as Founding Director for the Center for Infrastructure Defense.

Over the last 25 years, Dr. Alderson’s research has focused on the function and operation of critical infrastructures, with particular emphasis on how to invest limited resources to ensure efficient and resilient performance in the face of accidents, failures, natural disasters, or deliberate attacks. His research explores tradeoffs between efficiency, complexity, and fragility in a wide variety of public and private cyber-physical systems.

Dr. David Woods

Dr. David Woods founded Resilience Engineering as an approach to safety in complex systems in 2000-2003 as part of the response to several NASA accidents.

David is currently professor at the Ohio State University in Dept. of Integrated Systems Engineering with pioneering research on the interaction between humans and technology in risk critical activities for almost 40 years. His books on safety and resilience engineering include Resilience Engineering: Concepts and Precepts (2006)Behind Human Error (2010), and Resilience Engineering in Practice (2011). (publications)

Dr. Richard Cook

Dr. Richard Cook co-founded Adaptive Capacity Labs with John Allspaw and David Woods. He died in August, 2022.

He was a research scientist, physician, and pioneer in Resilience Engineering for safety in complex risk-critical worlds, and author of the seminal paper “How Complex Systems Fail” (video) as well as Behind Human Error (2010).

Richard was emeritus professor of healthcare systems safety at Sweden’s KTH. (publications)

Our Work

“Working with John and Richard was an eye-opening experience in many ways. Their framework to approaching incidents is one that will encourage you to ask more meaningful questions after an incident occurs, forget everything you thought you knew about incident analysis, and reveal areas the organization should direct more attention to in ways you didn’t know possible. Working with them is a first step in the journey to achieving a Learning Organization – a first step any business with software (hint: all businesses) should take.”

“This was a transformative experience. I feel privileged to have had the opportunity to receive this training and I am positive there are no comparable alternatives available in the market that comes close to what is offered here by ACL.”

Our clients range in size from 150 to over 40,000 employees around the globe, and provide goods and services via B2B and B2C. They represent a wide range of categories and markets, including:

  • Online Travel
  • Food Delivery
  • Enterprise Collaboration SaaS
  • Telecommunications
  • Streaming Media
  • Network Infrastructure Services
  • Healthcare Tech SaaS
  • Public Cloud Infrastructure
  • Government Research
  • FinTech/Trading Exchange
  • Construction Project SaaS
  • Recruiting/Employment SaaS

Here are three examples of engagements we have done. Since we agree to terms of non-disclosure with all of our clients, names of clients and sensitive details of the projects are withheld in these descriptions. 

Mid-size E-commerce Company

This client engaged us for an assessment of their organizational learning from incidents and to provide recommendations for improvement. They expressed concern about how insights were being shared and used to inform operational decisions across the engineering organization (in terms of roadmap changes, prioritization of existing work, etc.) More on this case→

Subdivision of SaaS Company

This client engaged us for an assessment of their organizational learning from incidents. They were confident about their current post-incident review practices, but expressed concern about how insights were being shared and used to inform operational decisions across the engineering organization (in terms of roadmap changes, prioritization, etc.) More on this case→

Financial Tech Company

This client reported having varied success with learning from incidents in the past and that challenges in responding to incidents could be helped by better post-incident analysis and debriefing facilitation. They suspected too much attention was given to localized “fixes” and not enough to how the teams coordinated and responded to incidents as they arise. More on this case→

Work With Us

We are currently accepting new projects for 2021!
Please tell us more about your organization and how we can help!