AWS Well-Architected Framework: The Security Pillar

treasaanderson

2 years ago

The security pillar encompasses protecting information, systems, and assets while delivering business value through risk assessments and mitigation strategies. We look at best-practice guidance for architecting secure systems on AWS.

AWS Security Pillar which is next in the series of talks on the Well Architected Framework

We’re continuing our series on each of the pillars of the well-architected framework. We talked about the operational excellence pillar last time. This time, we will talk about security. A prerequisite is threat modeling. If you’re going to talk about security, threat modeling is your number one way to understand where you are. As a pillar, it is interesting. There are ten questions and a couple of different sections.

AWS Security Pillar – how do you securely operate your workload?

The well-architected security pillar is about how secure your organization is. It goes into things like how you manage accounts, whether you have hooked up your control tower, and if you are using guard duty. It promotes the team’s awareness of security and how it plays into things. The types of things we engage with when looking at workload are blast radius: if something goes down, how will we recover it? Or is there a case there for failover? Or resiliency? It is broad, but there are things you can zoom in and focus on.

How granular your account posture is is a big one. Previously, you would have one big account with everything, and the blast radius was huge. With modern techniques, capabilities, and improvements, you can be fine-grained and have more accounts. Single sign also helps manage that burden. And AWS organizations, control tower, and cloud trail are mature capabilities that help you get an excellent initial posture.

Rule 1: Tightly Manage and Automate

One thing we like about well-architected is the nice flow to the questions and sessions. The first question: ‘How do you securely operate your workload?’ immediately gets into identity and access management, your inventory of people on machines, and how you manage that. Or how do you manage blast radius, permissions, and adding and removing people, accounts, machine accounts, and different resources? In a modern cloud environment, rule number one is that it is tightly managed and automated.

That, in particular, is quite important. It’s pretty complex. Usually, it ties back into the enterprise or a broader policy and gets teams asking what authorization controls are for this component. Or, if this user were to leave the system, how do we do that in a practical, secure way? And if someone leaves the organization, how do we ensure we revoke their access? It forces you to have those conversations, which is positive.

Least privilege principle

The Least Privilege principle comes to the fore, especially for serverless workloads. As you ephemerally spin stuff up and down, you may want to give star-star to everything and open up the world, meaning your blast radius is massive, and you’ve got a big security hole. So you need to be aware of the Least Privilege principle and give it a minimal amount to be functional. You must automate that and build it as part of your automation. Otherwise, it becomes an unmanageable burden and an ephemeral sort of workspace.

Detective Controls and Left of Attack

The next area in the AWS Security Pillar is one of our favorites: detective controls, how you detect and control security events. We love how security people talk about the ‘left of attack’: everything that happens before the attack. There is a time when the attack happens, and that’s panic stations. But there’s usually a whole bunch of stuff before that you can act on. And that could be two years prior. So, there’s a whole mindset around detecting weird activity when people are probing your system before the attack. That’s the hunter side of cybersecurity when people try to find breaches.

The tech uses machine learning to look for anomalies in your traffic or things that look out of whack, and it raises events for you to look at. But in general, you’ll use everyday detection controls by ensuring your observability is good, so when something happens, you are alerted if someone’s trying something. It always ties back to good observability. We are sometimes guilty of thinking from an AppSec perspective. If there are flows through your app and you wouldn’t expect to see traffic, someone gets notified, and it’s traceable. Many cool things are happening ‘left of attack,’ where your security org is more active than your typical app developer.

Managing emerging threat

It’s about keeping abreast of the latest developments and responding to new emerging threat vectors, like ‘Log4j’. How do you react to that new information to the left of your detection? Game days and security chaos engineering are ways of building good detective control capabilities and sharpening the software. ‘What happens if?’ scenarios help. Is your observability where it should be? Do you have the correct logging, monitoring, alerting, and alarming for rapidly detecting and remediating these events?

The Log4j one is a cracker because we use those events as they happen to find a better way to look through our Bill of Materials and assess whether we’re affected. Or how long did that take us to correct or detect? Are we vulnerable or not? We find things we could tighten up. That’s the type of conversation that you can have in that section.

Data Protection

We could tell many stories about ‘detective controls’! The next section in the AWS Security Pillar is ‘infrastructure protection.’ We’ll skip over that because it’s network and compute protection, which your teams will understand. The next one is data protection. There’s stuff here about encryption, etc., in rest and transition. But the interesting one is how you classify your data. It can be tricky whether your organization understands your data classification.

We have mentioned that code is a liability in other articles. Your data can also be a liability that you need to manage appropriately. You’ve heard that ‘data is the new oil.’ If you don’t store it correctly, oil is toxic, damaging, and flammable and has all sorts of impacts. You will only know what you have if you understand your data and have classified it correctly. One of the first things you can do is get a good handle on the data you have. Is it valuable? Is it needed? Are you getting business value? And if you’re not, get rid of it. Ensure you correctly set up your retention, deletion, and archiving.

Understanding data classification

Most organizations have a good data classification document or something that describes data classification about the industry or the organization. The challenge you’ve got is getting engineering teams to understand it.

Previously, we’ve woven data classification into the threat model exercise, so the first section is about what sort of data we are dealing with. Typically, we’ll link the threat modeling template to the data classification standard to force the facilitator to look at it. Then, we can see if we’re dealing with sensitive information that we should be taking extra precautions with and designing controls in the workflow.

And do we have proper encryption capabilities if we’re dealing with restricted information? Are we moving data in an encrypted fashion and storing data in an encrypted manner? Are we tagging it? We mentioned ‘Least Privilege,’ so can we track who’s looked at that data? Can we track where someone moved that data to? Have we got ‘Least Privilege’ access controls on that data? That’s a perfect one for ensuring that architecting for the data classification.

From a well-architected guardrails point of view, automate some guardrails in your provisioning infrastructure to cover items like encryption at rest, encryption in transit, tagging, and other basic security capabilities so that your team cannot create a resource that doesn’t adhere to these basic good practices.

Incident Response

The last section in the AWS Security Pillar is ‘Incident Response.’ It’s self-explanatory. How do you respond and recover from incidents? You want to be well-drilled with as much automation as possible. Sounds straightforward. But it’s complicated.

It ties back to the operational excellence pillar. You’re anticipating these events ahead of time. If you anticipate them, you have associated runbooks or playbooks to facilitate squads in particular circumstances. There’s a lot around education and ensuring that everybody in the organization understands what you do in the event of an incident. You don’t want a junior developer noticing something, not feeling confident or capable of raising their hand, and saying something is wrong here. You want a psychologically safe environment for everybody to submit an incident or a query about something that’s not quite right.

The AWS security pillar has a nice arc that starts with people and ends with people. It goes through all the technical stuff in the middle. But security is a ‘people’ responsibility.

So that’s the craic. Thanks very much for listening. Next time, we’re going to do the ‘reliability pillar.’ Look up the blog on TheServerlessEdge.com and @ServerlessEdge on Twitter.