Site Overlay

Building a Problem Prevention Culture with AWS Well-Architected

Why Prevention Beats Firefighting

Firefighting is often celebrated in engineering teams.
The engineers who dive into production issues and fix critical outages get the spotlight — while those who quietly prevent problems in the first place often go unnoticed.

We explore how to build a problem prevention culture through the AWS Well-Architected Framework, and why that mindset is essential for long-term value in cloud engineering.

As Mark McCann puts it:

“Sometimes the people who do the problem prevention don’t get rewarded — but the firefighters after the fact do.”

The truth is, great engineering teams make their work look invisible. Their systems are resilient, their observability is strong, and their outcomes are predictable.
That’s the foundation of a well-architected organisation.


From Firefighting to Foresight

Early in your engineering career, it’s easy to equate success with shipping fast. But as Michael O’Reilly explains, experience changes your perspective:

“Over time you realise everything breaks. You’ve got to expect failure — and design for it.”

Prevention is about engineering foresight: investing upfront in telemetry, observability, and resilience so that problems are spotted before they become incidents.
It’s a cultural and technical discipline that allows teams to stay focused on creating new value instead of getting bogged down in triage and firefighting.


Learn how to build a problem prevention culture using the AWS Well-Architected Framework and SCORP reviews to drive engineering excellence at scale.
Problem Prevention Culture

Why We Aligned Around the Well-Architected Framework

The idea of “good architecture” used to be subjective. Every architect had their own definition.
To avoid that ambiguity, our teams aligned around the AWS Well-Architected Framework, which defines clear pillars for operational excellence:

  • Operational Excellence
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimisation
  • Sustainability

As Dave Anderson recalls:

“We wanted a definition of good architecture that no one could debate. The Well-Architected Framework gave us that.”

This alignment simplified conversations, removed opinion, and provided a structured way to assess the quality of our systems — across hundreds of teams.


Operationalising Well-Architected with SCORP

The challenge was scale.
Traditional Well-Architected Reviews could take days — too heavy for teams working in a fast-moving environment.
To make it practical, we built our own lightweight approach called SCORP:

Security, Cost, Operational Resilience, Reliability, Performance.

Each team runs short SCORP sessions, using simple dashboards to track metrics across these dimensions.
The goal isn’t perfection — it’s continuous learning.

“Keep dashboards low fidelity,” Dave notes. “They don’t have to be perfect. Spend your time on the issues, not the visuals.”

The SCORP approach creates a safe space for teams to share insights, track regressions, and learn from one another.
Over time, it fosters positive peer pressure and shared ownership of quality.


Psychological Safety and Peer Learning

A problem prevention culture only works when teams feel safe to raise issues.
That means creating an environment where questions are encouraged, metrics are transparent, and mistakes are treated as learning opportunities.

As Michael explains:

“It’s really important that you foster an open culture — it’s okay to have something not in good shape. That’s an opportunity to improve.”

SCORP sessions aren’t audits. They’re conversations — grounded in real telemetry and engineering empathy.


From Culture to Capability

Over time, SCORP evolved into a strategic sensing engine.
By reviewing data across multiple teams, leaders could identify systemic issues — such as scaling challenges, security tool gaps, or resilience weaknesses — and invest in cross-team improvements.

Mark summarises it well:

“This is what a true problem prevention culture looks like. Teams are constantly refining, fixing, and improving across every pillar.”

That mindset scales. It drives reliability, reduces incidents, and frees engineers to focus on innovation.


Looking Ahead

In The Value Flywheel Effect, we predicted that problem prevention would still matter in 2040 — and it’s proving true.
As AI becomes part of the modern software stack, teams that already have strong prevention foundations will be best positioned to harness automation safely.

“You can’t shift left without problem prevention,” Mark reminds us. “They go hand in hand.”

Whether you’re embracing generative AI or scaling multi-cloud operations, prevention remains the key to sustainable velocity.


Start Your Own SCORP Journey

The SCORP process and templates are open-sourced at https://theserverlessedge.com/scorp-process-cycle/
We encourage teams to adapt them, experiment, and build their own rituals of continuous improvement.

If you’re serious about engineering excellence — stop firefighting.
Start preventing.


1 thought on “Building a Problem Prevention Culture with AWS Well-Architected

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Translate »