Most cloud security blog posts, news articles, and other guidance found online address risks such as governance, contract language, forensics, and other higher-level topics. There isn’t a lot of tactical information that gives actionable advice on what you should be doing today to mitigate cloud-specific risks. Here at Stratum Security, most of our customers have some part of their infrastructure or corporate applications in some sort of cloud or outsourced hosting provider. Additionally, our ThreatSim SaaS is hosted entirely within Amazon’s Web Services (AWS) Elastic Computing Cloud (EC2). In this post I’ll list some of the things we’re doing to protect our own data in EC2.
I won’t spend too much time discussing the legal and contractual issues with the cloud. They are not too different from non-cloud outsourced hosting risks. There is a ton of resources and commentary available online that addresses this. It comes down to this: the stuff you care about, your data and availability of that data, lives and sleeps in someone else’s house. Yes, you own the data but it’s not 100% under your organization’s control. The biggest technical difference (among others) of having your data live in someone else’s data center or cloud is that you lose control over the hardware, and in some cases software, that stores, processes, and transmits your data.
If you are hosted on a cloud platform, you may share certain hardware components with other customers (e.g. the hypervisor). You need to understand what you can protect, where you lose visibility, and where you need/can apply extra security sauce.
Let me preface these recommendations with the following caveats:
- This will be focused on EC2, and Infrastructure as a Service (IaaS) provider.
- We’re an information security company with a niche SaaS solution. Our data may be more sensitive than your data.
- Our recommendations are focused on security first, not necessarily performance. This is no one-size-fits-all.
As such, not all of the recommendations below are suited for every organization – but there are some controls that everyone can and should implement.
Below are a sampling of security controls we’ve implemented in our cloud. While they are specific to Amazon EC2, most will apply to other IaaS services:
1. Use a Virtual Private Cloud (VPC)
EC2’s “public” instances are essentially all on one big 10.x.x.x network. When you launch a new instance, it is stood up on some random 10.x.x.x IP address. By default EC2 security groups prevent other instances from talking to your machine unless you start opening things up. We launch all of our instances inside our own VPC so that all of our instances are on a specific network subnet (It’s a class B) that we defined. We have two different subnets within our VPC:
- DMZ – Contains boxes that have something open to the Internet (e.g. web servers, mail servers, jump box, etc.)
- Private – Servers that store sensitive information and don’t have a need to be exposed at all to the Internet such as our MySQL servers. The machines in the DMZ each have an elastic IP associated with them, the machines on the private network do not.
The use of a VPC has several advantages:
- You don’t need to keep track of a big list of transient EC2 10.x.x.x IP addresses that may change as you start/stop instances.
- Security groups within VPCs give you the ability to do egress (outbound) filtering; something that non-VPC security groups do not support.
- Organization. If your DMZ is the 10.0.1.0/24 and your private range is 10.0.2.0/24 it’s easy to tell what is what.
It’s easier to configure IP range security settings. For example, if you need to allow syslog (514/udp) from your DMZ to your central logging machine, you can easily permit 10.0.1.0/24 rather than have to allow specific public cloud IPs.
Amazon Virtual Private Cloud Documentation
2. Create a DMZ and a Private Network Within Your VPC
The DMZ model has been in use forever for a simple reason: if your servers that are exposed to the dirty Internet are ever compromised, it limits the scope of where the attacker can go to (usually) one or two ports (e.g. SQL). This is a relatively “low cost” control for most applications and has been an accepted architecture forever.
3. Create Security Groups for Each Instance
Security groups are essentially simple firewall rules configured within Amazon’s hypervisor. Security groups are assigned to the instance at launch and cannot be changed later. That said, we created specific security groups for every instance in our environment with the exception of our landing web servers. Our landing servers are replicas of each other so it’s not a big deal to have a “one-size-fits-all” security group there. But our jump box, database servers, mail servers, etc. all have security groups named after the specific instances (e.g. mysql01, mailer02, etc.). This allows us the flexibility to apply very granular rules to every instance without having to worry about an unintended consequence of a security group change that has been applied to several instances.
Amazon Security Groups Documentation
4. Use A Jump Box to Access Your VPC
There’s likely no legitimate reason that every single instance needs SSH exposed to the Internet. If you have 10 instances, that’s 10 perimeters to worry about. Essentially this is a choke point that all administrative SSH access must use to get into the VPC. Once the user is on the jump box, he can then SSH into everything else within the VPC using SSH keys. The jump box is configured so that it is only able to SSH to other specific IP addresses within our VPC. All user actions are logged and only specific users have sudo rights. Furthermore, instances within the VPC are not allowed to connect to each other. All SSH access must be done via the jump box. This way, if something in the DMZ is compromised, the attacker won’t be able to hop to other devices within the DMZ subnet.
Since the security of this server is so important to the security of our environment we pile on several layers of security here. First, in order to login to this instance the user must be coming from a specific IP address. Second, the user must have a valid SSH key. Third, all users must use two-factor authentication (more on this next). We also do a lot of system level hardening that I won’t go into detail on here.
5. Use Two Factor Authentication on Your Jump Box
Just to be clear: passwords suck, are awful, and should never be the single thing between the Internet and critical data. Passwords of all length and complexity are lost, stolen, forgotten, guessed, key logged, reused, and cracked. There’s no reason that critical data should be protected with ONLY a password. Facebook, Gmail, Dropbox, and even our own ThreatSim service all support some form of two-factor authentication. Why would you not protect administrative access to your environment with at least the same effort as your Facebook account?
We use Duo Security for our two-factor solution. It’s free if you have 10 users or less and works great. One of Duo’s founders is Dug Song (of dsniff fame) who has a high degree of security street cred. For our setup, Duo has a Linux SSH module that sends me a push notification to my phone right after I successfully authenticate with my SSH key. If I approve the authentication request I press “Approve” in the Duo app and my terminal passes me to my shell. Duo is available for iOS, Android, Blackberry, etc. If you lose your laptop with your SSH keys, the attacker must have your phone in order to authenticate via SSH.
You can also use Google Authenticator with a PAM module if you want to go that route. You don’t need to buy SecurIDs for everyone. There’s a lot of easy ways to do this.
6. Restrict SSH Access To Your Jump Box With Security Groups
Lots of developers folks love to have SSH open to the entire Internet for emergency or remote support. The problem is that right at this very moment, as you are reading this, there is likely a non-public exploit for SSH being used by bad guys. By allowing the entire Internet access to your SSH port there is nothing stopping an attacker from exploiting your machine. At least if you only allow very specific IP addresses in your security groups you’d be protected. Or, if an attacker gets ahold of your SSH key (hacked/lost/stolen laptop, etc.) at least your instance (e.g. jump box) would be protected by an IP restriction.
Are you always on the go or have an often-changing dynamic IP? Try to get a static IP from your provider. Or, update your security group every time your IP changes. Yes, it’s a little bit of a pain but not as big of a pain of having your entire EC2 environment compromised.
7. Use Outbound Security Groups
This one isn’t always easy to implement but bear with me. There is no reason why highly critical servers within your VPC should be able to initiate a FTP connection to any server on the Internet. There isn’t. For example, would you consider it acceptable for your database server containing all of your customer data to make an outbound FTP connection to a server in China? What about IP ranges in Russia where the Russian Business Network is hosted? It’s better to use egress rules and only allow specific ports and protocols out to specific hosts. Here’s why: If an attacker ever does make it all the way into your database server within the VPC, why make it easy for him to transfer a dump of your database back to his server. Yes, you will need to allow your devices out to specific IPs on the Internet for patches, updates, NTP, etc. But that’s a short list that is worth making and maintaining.
8. Enable EC2’s Two-Factor Authentication for EC2 Console Access
Earlier I talked about using two-factor on SSH and a bunch of other good stuff. None of it matters if someone can just guess your Amazon password (yes, the same one you use to order a ton of tube socks with free shipping), and login to your AWS account. Amazon supports hardware tokens as well as Google Authenticator so that when your AWS password is compromised, the bad guys would need your hardware token or phone to access your account. There is no excuse to not do dual-factor on your AWS login.
9. Use a Host-Based Intrusion Detection System (HIDS)
One thing you lose when hosting your app on EC2 is visibility into the network. Amazon doesn’t give you a span port to run your IDS to watch for bad things happening over the wire. That said, we still want to have visibility into anomalies and incidents within our environment. Enter OSSEC. OSSEC is a free HIDS that monitors system logs and events for events that match a known signature. The events are sent in real-time to a central OSSEC server that then sends the events to our operations personnel via email who can review the reports in real-time. For example we know when the md5 checksum of critical system binaries change, new users are added, or when our syslog shows something strange going on (e.g. segfaults, daemons starting/stopping, hardware problems).
10. Use A Central Log Server
We tested several open source and commercial log collection solutions and ended up pressing the “Easy Button” and going with Splunk. We considered using Log Stash but it was more complex than Splunk and required us to monitor and manage several different processes (redis, elastic search, java, etc.). I won’t waste blog space extolling the virtues of Splunk, but it’s great for collecting or correlating log events across our entire infrastructure. We use it for troubleshooting, security event investigations, etc. From a security standpoint, if a machine gets compromised, you can obviously no longer trust it. At least if it was sending logs in real-time to a centralized server you can go to Splunk and figure out what is going on.
11. Encrypt Sensitive Disk Volumes
This is a big one. Most organizations now use full-disk hard drive encryption on laptops. Why? Because laptops (and the hard drives they contain) are not physically in your direct control and may be stolen or seized without your knowledge since they aren’t locked in the data center down the hall. Virtual disk volumes in the cloud are arguably more difficult to protect because they are virtual – they can live in many places at the same time and be cloned within seconds. It is for this reason that every organization that stores data in the cloud should consider using disk encryption. Several solutions exist including native OS disk crypto as well as SaaS disk encryption providers.
The first thing to do is to think about what information needs to be encrypted. Sometimes it is obvious, like MySQL database files, customer documents, etc. Depending on the nature of your application is may include things like email logs that contain email addresses, Splunk logs, HIDS logs, application logs. All of these may contain fragments of sensitive information that require protection.
For Ubuntu, this is a great article that contains step-by-step directions on how to set up disk encryption in EC2.
One seemingly obvious mistake that many people make is storing the key in the cloud along with the encrypted data. This is like locking your door and then taping the key on the lock. The whole point is to make it so if an attacker gets ahold of your EBS volume that they can’t mount or read the drive. Since you can’t store the key in the cloud this means that you can’t add your encrypted volume to /etc/fstab where it will be automatically mounted at boot. Since Amazon doesn’t allow console access, you will have to mount the drive manually after boot. This obviously impacts the resilience of your application as an unscheduled (or scheduled) reboot requires human intervention to enter the key and mount the volume.
Disk encryption and key management is complex and requires a great deal of planning that deserves its own blog post. Another issue not discussed here is more complex distributed file systems and the impact of encryption on performance.
12. Alert On Application Errors
If you are running an application that is exposed to the Internet you should be logging and alerting on application errors. One of the challenges with application security is that applications hide malice. Meaning that if someone is attacking your application in a subtle way (flipping URL parameters for example) it may not match a signature that your IDS/IPS/HIDS may catch. Or if an attacker makes your application do something out of the ordinary, the application may not generate an exception that is useful or obvious. If a user changes accountID=123 to accountID=124, is that malicious? Will the application tell you that someone is poking around? Since we’re a security company, we worked closely with our developers to build in application errors that tell us when someone is up to no good.
We use the open source Errbit that lets us know when something odd happens with our application. Errbit isn’t a security tool by design but provides valuable and actionable security information. All Errbit errors are forwarded in real-time to our operations folks who can evaluate if it was just a benign, anomalous error or something more sinister. For example, if a user attempts to access an application resource that they do not have access to, we receive notification. We have it tuned so that we only see alerts where someone intentionally tampers with our application.
13. Internal and External Vulnerability Scanning
We perform regular internal external scanning of our infrastructure using Nessus. For the internal scanning we provide our scanner with an SSH key for authenticated (aka credentialed) scans that allows us to ensure that all devices have up to date patches and are configured properly. For external scanning not only do we scan exposed services for vulnerabilities but we also ensure that our security groups are configured as expected and there aren’t any surprises.