We talk through our Operational Excellence examples from our well architected experiences. Our post is the first in a series of conversations on the well architected architected framework and pillars.
Operational Excellence examples and Well Architected
We’ve written about well architected and the well architected pillars of SCORP or SCORPS, There are now six well architected pillars. Well architected is really interesting, because AWS, Google and Azure have their own versions of well architected. They’re all quite similar. We have had great success from working through these pillars.
So we figured we’d hit each pillar and have a quick chat about them starting with operational excellence. Is there anything else you would like to say at a high level about well architected.
It’s something we have found to be incredibly useful. It gives a frame of reference and a structure for asking better questions of our teams, systems, structures, and our processes and practices. It has been hugely useful for trying to evolve engineering, practices and companies. It’s hardened and approved, and it’s been battle tested in 1000s of companies which gives it a lot of credibility. And it’s not just Dave, Mark and Mike’s opinions. It’s good practice that has been proven to work.
That’s a major strength, isn’t it? I like the ubiquity. Whether you’re an architect, an engineer or a manager in one organisation, when you go to another, it’ll make sense.
Operational Excellence should be part of continuous architecture
It’s not a yearly process to deliver compliance once a year with well architected. It should be part of continuous architecture. The reason why I always encourage people to get certification is not for a bit of paper or a free water bottle, it’s because you have to learn well architected as part of certification. So starting with operational excellence, the AWS pillar breaks down into three areas. Each area has five or six questions. So the three areas (in the operational excellence pillar) are prepare, operate, evolve.
Operational Excellence Pillar – prepare, operate and evolve
Operational excellence means a lot of things to a lot of people, but let’s chat about prepare. What have your found to be in the prepare part of this?
It’s great to go in new areas and teams to asking these questions:
- Do you know who your users are?
- What is the purpose of your team ?
- Do you know what your highest priority is?
Some are very simple, basic questions.
Are you set up to meet the challenges that you’re faced with, the business requirements that you’re going to pursue or the needs you’re trying to meet?
Asking simple questions can be revealing
So simple questions like how do you determine what your priorities are can be very revealing. If you are in a safe space with the whole team involved you can get a really good conversation. We know our priorities for this week and for next week, but we’re not quite sure what we’re doing for the month after. It’s a good conversation to tease out if you are aligned with the strategic direction? Do you have a prioritisation framework or are you making it up ‘on the hoof’?
This pillar needs the whole team involved in the conversation. Some questions require management to be involved, some require the tech lead or the engineer to understand the big picture and operations. We talk about consistency. In this section there are recommendations for playbooks/runbooks and standards for making preparations for your operation: prepare for failure or everything fails all the time.
Operational Excellence: Prepare
You have got to prepare to move onto post implementation and hand off to different team or place where you’re bringing on new engineers or whatever. Do you have the runbooks for the operations in a particular workload? Do you have the playbooks that are linked to observability in your dashboard, so that when things go wrong, there’s a solid set of instructions to deal with that problem and they don’t have to go in and unpack what you’ve built out. So there’s a lot of good, solid foundational guidance. From an architecture perspective (we’re all architects), it’s table stakes for consistency across teams.
‘Prepare’ looks at tribal knowledge like when you ask a question and the response is ‘Fred says’. In other words: ‘I don’t know why we do that, but Fred says, we do that’. Or the response is: ‘ask my manager’. But what happens when your manager isn’t there? We need leadership and empowerment within the team and written down for everyone. So ‘Prepare’ checks team culture.
It also checks simple stuff like: do you have enough people to meet the challenges? Do you have assigned owners who are going to be responsible for processes, practices and operations. If you can get these foundations in place early, you evolve, go down through the lifecycle and start applying the other well architected pillars. Your chance for success greatly improves because your operational excellence pillar has set the foundation.
Operational Excellence: Operate
The next pillar is operate. So you start with prepare and then move to operate. I like operate because there’s a lot of observability. I like thinking of a workload as an asset, how to understand the health of that asset and how to monitor it to make sure it’s working well.
It’s about getting the team ready for production. A particular bugbear of mine is when teams aren’t thinking about how to validate in production and how to spot regression. What are the key performance indicators of the workload? When things go wrong, are they able to spot it and have they thought about how to remediate or correct those sorts of things.
Things do go wrong
You go back to prepare again. There’s always something that is going to go wrong, something you haven’t predicted or an alternate path has been missed. So when those things happen, have you got the correct procedures for learning what that defect teaches so you can bake it in and toughen up your operation going forward. It’s an holistic way of thinking and you need those mechanisms to show you how your workload performance by product.
It’s critical to have those information radiators and dashboards available and not just for the team. If you have proper observability you can show the C suite the team working on a particular capability, feature or value stream and how it relates to our vision and strategy. That’s proper operational observability across everything including not only the health of your workload, but the health of your team. Door key metrics should be part of how you operate with a sustainable pace for the team.
Operational Excellence: Evolve
The last one is evolve. You go through prepare, operate and then evolve. And it’s quite simply about how you evolve operations which doesn’t mean cutting costs and reducing the budget!
It’s what Mike said earlier. It’s about having a continuous improvement mindset with feedback loops in place. We’re big into mapping and evolution is a cornerstone of Wardley mapping. If you don’t take these signals from your systems and your workloads on board and use them to evolve improve and get better than there’s no point having observability and dashboards.
That’s the key point. We’ve written about the SCORPS process, and driver of continuous improvement. Your operations are going to generate a lot of data and useful information that you, as an engineer, manager or architect can use to evolve your current setup. You should be always looking to learn.
There is always room for operational excellence improvement
The operational excellence pillar sets us up nicely because once you think through evolve and operations, you’re evolving the other pillars of cost, security, reliability, performance, and sustainability. You can always save more money, make the thing faster, more reliable, make it cheaper, make it more secure. People think operations are done because it’s rolling and it’s fine. But there’s always things you can improve.
You set up for success and you put the foundational building blocks in place to increase your chances of a successful development cycle.
So that’s the operational excellence pillar from well architected. That’s the craic. We’ll be talking some more about the pillars. There are posts on this on TheServerlessEdge.com, on Twitter @ServerlessEdge, LinkedIn and Medium. So thanks very much.
Transcribed by https://otter.ai