Case: Financial Technology Company
This client reported having varied success with learning from incidents in the past and that challenges in responding to incidents could be helped by better post-incident analysis and debriefing facilitation. They suspected too much attention was given to localized “fixes” and not enough to how the teams coordinated and responded to incidents as they arise.
They struggled to make meaningful use of post-incident reviews and the group meetings were reported to have a “due diligence” or “checklist” character to them, resulting in poor attendance and very little insight generated.
As part of an initial assessment, our findings included:
- Previous success with post-incident debriefings ended when a key internal advocate and skilled facilitator-analyst left the company. Veteran staff remembered how valuable well-executed facilitation and analysis once was, new hires don’t. This gap in perception was growing when they engaged us.
- Engineering leadership was frustrated that something that appeared to work well (post-incident review) no longer did and had their attention on managing external stakeholder expectations when incidents arose. This resulted in a vicious cycle of reacting to incidents, rather than building expertise to extract value out of the events to fuel proactive work.
Recommendations
- To identify a small group of enthusiastic engineers for training and coaching in Incident Analysis led by ACL staff. Fundamentals of event reconstruction, interviewing techniques using the Critical Decision Method, and accident investigation concepts are covered.
- An initial cadence of shadowing and follow-up evaluation for this group is set up, and ACL staff gives coaching/feedback on interviewing, prep, and analysis topics.
Markers of progress
- Post-incident review meetings are now “standing room only” events and are eagerly anticipated across both engineering and customer support staff.
- Engineers report that they attend these events and take notes because they “can learn things there that they can’t learn elsewhere.”
- Engineers who have been trained in incident analysis are now seen as critical resources. Group debriefings and other post-incident meertings are now scheduled around their availability.
- Engineers insist that post-incident debriefings need to be separated from the creation of follow-up action-items, which then takes place in a separate meeting days later to allow for “soak” time. Note: this was unanticipated, but is a strong indication that they’re taking deep learning seriously. Light feedback appears to indicate that ‘remediation’ or follow-up items are seen as being more valuable due to this.