Mark, Mike and Dave do a walkthrough the AWS Security pillar that forms part of the Well Architected Framework. This is the second part of a series of talks. The team share their insights into how to manage this vital but often complicated aspect of modern architecture.
We’re continuing our series on each of the pillars of the well architected framework. We talked about the operational excellence pillar last time. We’re going to talk about security this time, which is our favourite well architected pillar.
It goes without saying that a prerequisite is threat modelling. If you’re going to talk about security, threat modelling is your number one way to understand where you are. As a pillar it is quite interesting. There are 10 questions and a couple of different sections. We will fly through them.
AWS Security Pillar – how do you securely operate your workload?
The well architected security pillar is aimed at how secure your organisation is. It goes into things like, how you’re managing accounts, is your control tower hooked up and are you using guard duty? It promotes team’s awareness of security across the organisation and how it plays into things. The types of things I engage with when we’re looking at workload are blast radius: if something goes down, how are we going to recover it? Or is there a case there for failover? Or resiliency? It is broad but there are things you can zoom in and focus in on.
How granular your account posture is, is a big one. In the past, you would have one big account that had everything and the blast radius was huge. With the modern techniques, capabilities and improvements, you can be fine grained and have more accounts. Single sign also helps manage that burden. And AWS organisations, control tower and cloud trail are mature capabilities that help you get a good initial posture.
Rule 1: Tightly Manage and Automate
One thing I like about well architected is the nice flow to the questions and sessions. The first question: ‘how do you securely operate your workload?’, straight away gets into identity and access management, your inventory of people on machines and how you manage that. Or how do you manage blast radius, permissions, and the process of adding and removing people, accounts, machine accounts and different resources. In a modern cloud environment, rule number one is that it is tightly managed and automated.
That, in particular, is quite important. It’s quite complex. Normally, it ties back into the enterprise or a broader policy and it gets teams asking what are the authorization controls for this component. Or, if this user was to leave the system, how do we do that in an effective, secure way. And if someone was leaving the organisation, how do we make sure their access is revoked. It forces you to have those conversations, which is positive.
Least privilege principle
I think the Least Privilege principle comes to the fore especially for serverless workloads. As you ephemerally spin stuff up and down, you can be tempted to give star-star to everything and open up the world meaning your blast radius is massive and you’ve got a big security hole. So you need to be aware of the Least Privilege principle and give it the minimal amount to be functional. You have got to automate that and build it as part of your automation. Otherwise it becomes an unmanageable burden and an ephemeral sort of workspace.
Detective Controls and Left of Attack
The next area in the AWS Security Pillar is one of my favourites: detective controls, how you detect and control security events. I always love the way security people talk about ‘left of attack’: all the things that happen before the attack. There is the time when the attack happens and that’s panic stations. But there’s usually a whole bunch of stuff before that, that you can act on. And that could be two years prior. So there’s a whole mindset around detecting weird activity when people are probing your system, before the actual attack. That’s the hunter side of cybersecurity when people try to find breaches.
The tech uses machine learning to look for anomalies in your traffic or for things that look out of whack, and it raises events for you to take a look at. But in general, you’ll use everyday detection controls by making sure your observability is good, so when something happens, you are alerted or your attention will be drawn if someone’s trying something. It always ties back into good observability. I’m guilty of thinking from an AppSec perspective. If there are flows through your app and you wouldn’t expect to see traffic, someone gets notified and it’s traceable. There are a lot of cool things happening, as you describe it Dave: ‘left of attack’, where your security org is probably more active than your typical app developer.
Managing emerging threat
It’s about keeping abreast of latest developments and responding to new emerging threat vectors, like ‘Log4j’. How do you respond to that new information to the left of your detection? Game days and security chaos engineering are ways of building good detective control capabilities and sharpening the software. ‘What happens if?’ scenarios really help. Is your observability where it should be? Do you have the right logging, monitoring, alerting and alarming for rapidly detecting and remediating these events?
The Log4j one is a cracker, because we use those events as they happen to find a better way to look through our Bill of Materials and assess whether we’re affected? Or how long did that actually take us to correct or detect? Are we vulnerable or not? We find things we could tighten up. That’s the type of conversation that you can have in that section.
I could tell many stories about ‘detective controls’! The next section in the AWS Security Pillar is ‘infrastructure protection’, but we’ll just skip over that because I mean it’s network and compute protection, which should be fairly well understood. The next one is data protection. There’s stuff here about both encryption etc, in rest and in transition. But the interesting one is how do you classify your data? Which is a tough question and does your organisation understand your data classification?
We have mentioned that code as a liability. Your data can also be a liability that you need to manage appropriately. I am sure you’ve heard that ‘data is the new oil’. Well, oil if you don’t store it correctly, is toxic, damaging and flammable and has all sorts of impacts. If you don’t understand your data and haven’t classified it correctly, you won’t know what you have. One of the first things you can do is get a good handle on the data you have. Is it valuable? Is it needed? Are you getting business value and if you’re not, get rid of it. Make sure you have your retention, deletion and archiving set up properly.
Understanding data classification
Most organisations have a good data classification document or something that describes data classification as it pertains to the industry or the organisation. I think the challenge you’ve got is getting engineering teams to understand it.
Previously we’ve woven in data classification into the threat model exercise so the first section is what sort of data are we dealing with. And typically we’ll put a link on the threat modeling template to the data classification standard, to force the facilitator to have a look at it. So then we can see if we’re dealing with sensitive information that we should be taking extra precautions with and designing controls in the workflow.
And if we’re dealing with restricted information so do we have proper encryption capabilities? Are we moving data in an encrypted fashion and we storing data in an encrypted fashion? Are we tagging it? Mark, you mentioned ‘Least Privilege’, so can we track who’s actually looked at that data? Can we track where that data was moved to? Have we got ‘Least Privilege’ access controls on that data. So that’s a very good one in terms of making sure that architecting for the data classification.
From a well architected guardrails point of view, automate some of the guardrails in your provisioning infrastructure, so that encryption at rest, encryption in transit, tagging and other basic security capabilities are turned on so that no resource can be created that doesn’t adhere to these basic good practices.
The last section in the AWS Security Pillar is ‘incident response’. It’s fairly self explanatory. How do you respond and recover from incidents? You want to be well drilled with as much automation as possible. Sounds straightforward. But it’s complicated.
It ties back to the operational excellence pillar. You’re anticipating these events ahead of time. If you’re anticipating them, you have associated runbooks or playbooks to facilitate squads in particular circumstances. So exactly.
There’s a lot around education as well and making sure that everybody in the organisation understands what you do in the event of an incident. You don’t want a junior developer noticing something, and not feeling confident or capable to raise their hand and say something is not right here. You want a psychologically safe environment for everybody to raise an incident or a query something that’s not quite right.
In the AWS security pillar, there’s a nice arc that starts with people and ends with people. It goes through all the technical stuff in the middle. But security is a ‘people’ responsibility.
So that’s the craic. Thanks very much for listening. Next time we’re going to do the ‘reliability pillar’. Look up the blog on TheServerlessEdge.com and @ServerlessEdge on Twitter. Thank you very much. Bye!
Transcribed by https://otter.ai