The Aptible Update Webinar Series is a quarterly presentation that covers recent features and changes to the Enclave container orchestration platform and Gridiron security management tool.
We hosted our Q4 Update Webinar on October 25, 2017. In it, we covered:
- Enclave. New ways to make Enclave infrastructure easier to audit, including Managed HIDS, SSH Session Logging, Activity Reporting, and much more.
- Gridiron. Making security and compliance audit/certification preparation easier, Customer & Vendor Management, and much more.
We provided a brief recap of this webinar on our blog.
Frank Macreery: Alright everyone. Well, I think we’ve gotten the latecomers in, too, so we’re going to get started here. Welcome everyone and thanks for joining us here today. For those of you who are new, this webinar is part of the Aptible update webinar series, which is a quarterly presentation that covers recent features and changes to the Enclave deployment platform in Gridiron security management products. [00:00:30] Just to describe those a little bit more, our first product is Enclave. It’s an AWS based docker container orchestration platform for deploying apps and databases into secure isolated environments. Our second product is called Gridiron, and it’s a security management tool to manage secure activities and generate compliance deliverables. Many of you are already familiar with these products, but what you’re going to hear about today is some of the new improvements to both of these features from the lead architects [00:01:00] of the two products themselves, Thomas Orozco and Skylar Anderson, respectively. Without further ado, the agenda for today’s webinar is … First, we’re going to cover Enclave features and updates, followed by a Q&A for Enclave. Then we’re going to go over some Gridiron features and updates with a Gridiron Q&A. Thomas Orozco is going to host the Enclave portion and [00:01:30] Skylar Anderson is going to host the Gridiron portion. Any questions that you have throughout, feel free to just post them in the Q&A in the zoom controls. You can also use chat, but Q&A is a little bit better. We’ll be monitoring that Q&A section throughout and anything that we can answer live, we will. If there’s too many questions, we will figure out a way to answer either via text in the Q&A response box [00:02:00] or follow-up afterwards. Afterwards, you’ll be able to find the presentation video and slides on Aptible.com/resources. One important announcement that I want to make before handing this over to Thomas is a recent announcement for Aptible the organization. In September, [00:02:30] we officially became ISO 27001 certified. ISO 27001 is an international cross-industry security framework that specifies requirements for managing security across an organization. One of the ways that ISO 27001 differs from other frameworks, say HITRUST or HIPAA, is that it is cross-industry. One of the ways that it differs from other cross-industry frameworks like [00:03:00] SOC 2 is that it is focused on the entire security processes for an organization, whereas SOC 2 is more closely focused on the technical safeguards that an organization implements. What this means for you is that you can use Aptible’s ISO 27001 certification to prove to your customers, your auditors, any regulators that Aptible, your hosting platform, has met strict standards for data security. [00:03:30] If you have any questions about ISO 27001 or interested in pursuing ISO 27001 certification yourself please just reach out to us at contact.aptible.com. We have a lot of familiarity going through the process at this point. We actually used our own product, Gridiron, to manage our certification process. We’re happy to help you with any certification questions you come up with along the way. Of course, you can read [00:04:00] more about this on our website at go.aptible.com/ISO27001. So without further ado, I’m going to hand this over to Thomas Orozco to discuss some of the recent improvements in Enclave.
Thomas Orozco: Thank you, Frank. All right, so let’s jump in and look at what’s new on Enclave this quarter. Some of us that have already attended this webinar in the past, as you know … Our goal with Enclave is to try to be the best place for you to manage [00:04:30] your regulated or sensitive projects. This quarter there’s two things we did to help you with this. The first one is we’re making Enclave easier to use, we invested a lot into making it more predictable and well-documented. We’re also making Enclave easier to audit. We’re really working towards it being a brilliant audit-ready platform. As long as you deploy on Enclave, you can easily really pass audits, whether they’re coming from customers or from regulators. So, we’ll start with easier to use. [00:05:00] There’s two sections, really, to look at here. There’s a set of new features I’d like to introduce, and then a set of improvements. So, as far as new features are concerned, we have three. We launched a new documentation website. We now have self-service environment creation via the Dashboard and finally the CLI has end-point management, which is really about catching up from the Dashboard. This was available in the Dashboard for a while. We also have a number of improvements. The restoring backups now supports restoring across multiple environments. Maintenance [00:05:30] pages are now served when your app is scaled to zero, this is done immediately. And finally, metrics now include CPU metrics. So, we’ll review all of these items, one by one, and let’s start with the new documentation side. So, some of you may have already seen this. We introduced the new docs website earlier this quarter, actually, fairly early this quarter. This landed in August. The different shades from our existing, like what we had before, the main difference is that this includes a lot of reference material. So, whereas before [00:06:00] we had a number of really, you know, tutorials and really a Q&A, FAQ-format, this new website has really comprehensive materials. If you’re really wondering, and you want to dig deep into a particular topic, for example, if you’re wondering how health checks work on [inaudible 00:06:14], if you’re wondering what are the various options for deployment, the new doc website really tries to be very comprehensive and really explains how these things work. Separate from [inaudible 00:06:23] use them, we’ve of course ported over all the new material we had before, so all of the tutorials are still there. In fact, we got in new tutorials. [00:06:30] We even have a simple application right now that you deploy on Enclave. We’ve also expanded our troubleshooting instructions, all of our troubleshooting documentation is more complete than it ever was before. This includes documentation for Enclave, but also for Gridiron. If you like to check it out, you can just go to aptible.com/docs and just browse around or search for any topic that might interest you. And, of course, we’re very interested in your feedback on this. If you have any questions or feedback about the docs, you can feel [00:07:00] free to reach out to us as usual. The next [inaudible 00:07:05] feature I landed this quarter … actually, just toward the end of this quarter, just a few days ago, is self-service environment creation. This is a menu. The model that you’re seeing in the screenshot, you can access it via the Dashboard that’s in the sidebar. You can just click on create environment and get this. There’s only two major changes here that will affect you positively. The first one is regarding shared-tenancy environments. Before the change, you’d have to, [00:07:30] if you wanted to … By default, you’d be deployed to US East one. That is in Virginia. With this change, you can now choose the region where you would like to be deployed. As of this change, you’re now able to deploy to any region where we have shared-tenancy stacks when creating a new shared-tenancy environment. It includes two regions in the U.S., U.S. East and U.S. West, as well as a region in Europe, and two regions in southeastern Asia. These are the blue ones that you can see on the map. For you, what that means is if you were already deployed in one of these other [00:08:00] regions before, any new environment you create, you can now do that and just pick the region you want when creating it without having to go through support and ask for your development environment to be located somewhere else. If you’ve never done that, and you were always in the default, you now have the option of choosing another region. I imagine some of you may actually not even realize these other regions were available before, so we’re really happy for them to now [inaudible 00:08:22] create an environment. That’s for shared-tenancy. For dedicated-tenancy, the big difference is that it’s fully self-serviced [00:08:30] now. Most of you probably have dedicated-tenancy environments on Enclave. These are also called PHI-ready, which is the wording we used before, early referencing the tenancy. Before this change, when you created them, you had to wait for them to activate. This could take a little while. Typically, you could have a few hours to wait for the environment to activate, which could be a little frustrating, or would kind of get into your flow. You were launching your environment, trying to reorganize everything, and then you have [00:09:00] to wait for us to activate it. The good news now is this no longer happens. New environments now auto-activate, so if you already have a dedicated-tenancy environment, any new environment you create will activate automatically, so you don’t have to wait on us or anyone to go ahead and create new environments. This is going to give you a lot more flexibility when you want to create new layers of logical isolation in your environments. Speaking of this, I think it’s probably a good idea to recap a little bit about various layers of isolation we have on Enclave. There’s really [00:09:30] two of them. Stacks and environments. Let’s start off with stacks, first. Stacks, they’re isolated really at the network level. Each individual stack on Enclave is a virtual network consisting of an AWS VPC, and instead of docker hosts, which are AWS EC2 instances. There’s also a number of endpoints, which are [inaudible 00:09:52] in volumes as well. So that’s stacks, and then you have environments, which is what you would normally create. As a user, you can’t create [00:10:00] new stacks. That’s something we do. But as a user, you can create new environments. Environments provide logical isolation. Essentially, each environment is mapped to a given stack. You have the stack that your environment is tied to data mines where your app containers and database containers will be deployed. They’ll be deployed on docker hosts for that stack. What’s important to realize is that environments really provide logical isolation, permissions, and so on, can be controlled on an environment basis, [00:10:30] but actual network isolation really depends on the stack. If you have two environments on the same stack, their apps, and databases can communicate together because they’re on the same network. In some cases, that’s a feature. In other cases, of course, that’s not desired, which is why we have two types of stacks. We have single-tenant and multi-tenant stacks. Single-tenant, or dedicated-tenancy stacks are what you’re supposed to use for sensitive and regulated data. That’s where you have your PHR-ready environment and so on. Multi- [00:11:00] tenant stacks, they’re just like [inaudible 00:11:02] and development. We’re not handling any sensitive data. You have these two options when creating a new environment, you can pick which one’s more appropriate for you. Moving on, the last new feature I want to introduce in terms of ease-of-use is endpoint management. Prior to this change, endpoints management was in the Dashboard only, so you had to go through the Dashboard to create new endpoints. It’s now available in the CLI. There’s a set of new commands that you can see [00:11:30] listed here that let you create new database and app endpoints. They’re all available database endpoints. You can create all of these via the CLI. There’s a number of use cases for this, but this was heavily requested mostly for users that wanted to interact with their endpoints and deploy new apps, maybe set up a Q&A app for [inaudible 00:11:54] without having to interact with the Dashboard at all. This is why this new feature became available, which essentially means you can now manage everything via the [00:12:00] CLI, whilst endpoints were until recently something had to go through the Dashboard to manage. Of course, you might need to update your CLI to get access to this, so make sure you get the latest version. You can navigate to the docs via the link on the slides. You go to aptible.com/endpoints. If you want to learn a little more about what are the options you have, see some research examples for the CLI as well. That about wraps it up for the new features, [00:12:30] and now I’ll jump into feature improvements. These are the first of minor changes, but still useful. The first one is restoring backups across environments. This was, again, fairly heavily requested. This means that whenever you use the backup restore command in the CLI, you now can use the dash dash environment flag to choose which environment to restore to. This has a number of benefits. I’ll get into that in just a minute. One thing I want to mention first is this protects you against making mistakes. If you have a backup coming from [00:13:00] a dedicated-tenancy environment, we’re not going to let you copy that over into a shared-tenancy environment. Your backups can’t leave the stack they were created in, to make sure you don’t accidentally copy data to somewhere you didn’t mean to copy into. Various use cases for this, there’s mainly two use cases. The first one is if you have an analytics workflow, maybe you’re restoring the production data into an analytics reporting database on the periodic basis. This lets you use different [00:13:30] environments for this. You can have your production environment for the actual live data, and then a reporting environment using logical isolation. The reporting environment, different users will have access to, but the production environment, you can really lock down access to it. This lets you essentially take a backup and move the backup to a less [inaudible 00:13:46] environment. Likewise, if you have a developer workflow where you use the same data, you use production data for development after sanitizing it, this new change will let you do the same thing. You can take [00:14:00] your data and then port it over to a lesser-privileged environment. Here, again, all of this is documented as well, so if you just go to aptible.com/restore-backup, you’ll see a number of usage examples on this. The next improvement we’ve shipped is for maintenance pages. First off, let’s take a step back and talk a little bit about what they are. Maintenance pages are what we serve on Enclave when our app is failing to respond to requests. There’s, of course, a number [00:14:30] of reasons why your app may fail to respond. One is maybe it just timed out. The other maybe just blew up in flight for some reason. There was a bug in your app, and it just crashed. We serve a maintenance page to the customer, so they can see something. Another example of course is when your app is not actually there. For example, if you’re in maintenance, and you’ve scaled your app down to zero. Prior to this change, what would happen when you scale to zero, is there would be a DNS level failure. You’d notice at the DNS level we have a health check that happens there. We [00:15:00] know they say the app is down, let’s route the traffic somewhere else [inaudible 00:15:03] page. With this end, the problem with this is that it might take a couple minutes after you scale to zero to show the maintenance page. With this change, what we are doing now is we make that change ahead of time, and that’s essentially what most users are expecting, which is why we made that change to make it behave just to what you expect it to work, really. With this change, what will happen now is we scale to zero, it will be a little slower because we have to do the DNS change [00:15:30] first, but it will be no phase which your customers will see anything but a maintenance page. Your app will go down quietly and then the maintenance page will be served immediately. Of course, if you only use this, you probably want to customize the maintenance page as well, so it tells the maintenance page you’re all set. You can set on your app to tell us what page to serve for maintenance. The menu case for this, of course, if you want to place your app into maintenance mode, maybe because you have database migrations that you’d like to run. [00:16:00] We can use this change, just scale down to zero, and everything will happen smoothly as you expected. If you want to learn a little more about this, again, you can use this link. Go to aptible.com/maintenance-page. Once again, this will take you to our new docs. This will let you review all of the various settings that are available and really see in detail how this whole process works. Finally, to wrap up the improvements, the last one is metrics. We’ve included CPU in the metrics that we show. That means that we currently have … Before [00:16:30] this change, we had memory load average, and for databases we have disk usage. With this change, we now have CPU usage available for both. As you can see in this graph, we also show you your usage compared to your CPU levels. You can use this to troubleshoot sluggish apps, get a better idea of how much CPU you’re using, and they’re available, of course, for apps and databases in both cases. The one thing you probably won’t know about is what kind of CPU usage should you be expecting? What’s reasonable, what’s not reasonable? [00:17:00] At the platform level, what we do and what you should know is that we allocate 25% of a CPU thread per gig of RAM for your container. That means that if you have a 1 gig container, you get 25% of the CPU thread. If you have a 2 gig container, you get 50%, and if you have an 8 gig container, well, 7, you get 1.75 CPU thread. The one thing you need to know, however, is that we don’t necessarily enforce this everywhere. That really depends on the tenancy of your stack. [00:17:30] Looking back at what I mentioned before, we have two types of stacks. We have shared and dedicated-tenancy. The difference really is on the shared stack, you’re not the only customer. If you are using more than your CPU allocation, you’re actually taking that away from another customer. On shared stacks, we do enforce CPU levels. Your containers will be allowed to burst for a short period of time above the limit, but it will be very short. Overall, on a shared stack, you will not be allowed to [00:18:00] exceed your CPU limits. However, on a dedicated stack, we don’t enforce these limits. The reason being that in most cases, that’s desirable for the customer. For us, in practice, you’re not taking CPU away from someone else. You’re the only person on that stack in the first place. For the dedicated stacks, CPU limits are fully opt-in. It’s fully up to you to decide whether you want them to be enabled or not. Me, personally, I think it’s better to enable them, because performance is more predicable. [00:18:30] On the other hand, you have to keep in mind that if you enable CPU limits, you’re likely to need bigger containers to have the CPU performance you’re currently receiving without the limits in place. Again, the decision’s fully up to you to make, and the length that you have here, go to aptible.com/cpu-limits, actually goes into this a little more in detail if you’re contemplating that decision. Finally, the last thing I wanted to mention really quickly, we’ve introduced also a couple of new databases this quarter. We’ve introduced [00:19:00] PostgreSQL 10, just a few days [inaudible 00:19:03] released. This problem was widely anticipated, we had a number of requests before any of them went live, so this is now available. We also have support for Redis 4.0 as well. That about wraps it up for the changes being made in terms of ease-of-use. Maybe if we have a few questions, we can take them now before moving on to the auditing changes we’ve made.
Frank Macreery: We do. The first question was about shared versus [00:19:30] dedicated environments. Mark asked that his staging site is on a shared environment and has noticed that it is running slower than his dedicated environment. This touches on what you were just describing, so that’s question one. Question one is why that’s the case. Question two is, “Is there an additional charge to move to a dedicated environment?”
Thomas Orozco: That’s a good question. First off, indeed, you [00:20:00] shouldn’t expect different performance in a shared environment because CPU limits are active there, but not on your dedicated environment. There’s no charge for additional environments really. What we charge to is only the dedicated stack. If you have a dedicated stack, there is a flat fee for this. If you want to, you can actually have multiple dedicated stacks as well. However, environments are free. That is, if you just go into the Dashboard right now, and you click on create new environment, you can create a new environment on your dedicated-tenancy stack right now. [00:20:30] You’ll be able to deploy your development applications there if you’d like. The one caveat to keep in mind is that this means that your development and production and apps will be in the same stack, which means they’ll have network access to each other. That might not be desirable. In this case, you’d have to provision a separate dedicated stack on which to run your development application. This one, having a separate stack, that one will be something you have to pay extra for. I think, Frank, the pricing is $4.99 a month for stacks. [00:21:00] Is that correct?
Frank Macreery: Yeah, it’s $4.99 per month for each additional stack. If you’re on a production plan on Enclave, you already have the one dedicated stack already.
Mark: Thanks. All right.
Thomas Orozco: Any other questions?
Frank Macreery: Yeah, so, another question is is it possible to deploy to the app while it’s scaled to zero? So while it’s in maintenance mode.
Thomas Orozco: Yes. You can just go ahead and deploy the app. It will essentially do exactly what you’re expecting, [00:21:30] which is deploy the app, but it will not change the scale of the app. [inaudible 00:21:33] happen very fast. When you deploy the app and it’s scaled to zero, the only thing that will happen is that we’ll make a cache of your docker image. Essentially, we host the docker image in our own infrastructure. The reason we do this is to make sure that instead of on your registry that you’re using [inaudible 00:21:51] with us. When you deploy, and you’re scaled to zero, what will happen is we’ll download the image and cache and run, store it on our own registry. When [00:22:00] you scale back up, we’ll use that image, but you can just deploy as many times as you want while you’re scaled to zero, it’s just of course now you’ll still be scaled to zero after you deploy. You’ll have to scale back to one, two, or more afterwards.
Frank Macreery: Cool. Thanks, Thomas. And then the last question for right now … This one’s open-ended but probably worth responding to in the webinar. We got a question, what kind of infrastructure are we using [00:22:30] under the hood? Are we running on top of Kubernetes, what’s Enclave built on?
Thomas Orozco: That’s a good question as well. We use our own scheduler. There’s a number of reasons for this. We’ve actually investigated moving over to Kubernetes at some point. One of the main reasons right now not to do this is because Enclave is more designed to operate … We have a number of small stacks. Kubernetes is really ideal for situations where you have a very large pool of servers, and you can throw your resources at them on [00:23:00] any server you don’t really care too much about. In our case, we actually have a large number of smaller pools. Each of our customers have dedicated stack that’s much smaller, so it’s very important for us to be very precise in the way we schedule repairs to avoid dicey issues for you like memory contention, CPU contention as well. The short answer being we use our own scheduler and infrastructure. Kubernetes could not be entirely appropriate for us to use at this stage. It’s a thing we’re keeping an eye on. Right now, what I just mentioned, [00:23:30] it wouldn’t be entirely appropriate for us to use it right now.
Frank Macreery: Cool. Thanks, Thomas. That’s it for right now-
Thomas Orozco: One thing I want to mention about this, we did actually speak a little more about how we are architected. I think it was two webinars ago. Not the one we did last time, which was probably in July. It was probably the one before. If you look at our webinar archives, we actually talked a little bit about how deployment processes work and so on. Of course, if you have any questions, feel free to reach out to support [00:24:00] as well. We’re happy to, if you have specific questions, you can ask us as well.
Frank Macreery: That’s a great point, Thomas. I’ll drop a link to that earlier webinar in the Q&A replies.
Thomas Orozco: Thanks, Frank. All right, so-
Frank Macreery: All right.
Thomas Orozco: That was it, right?
Frank Macreery: Yeah.
Thomas Orozco: Great. All right, so, moving on to our next section, which is making Enclave easier to audit. The three changes we shared this month here, SSH session logging, activity reports, [00:24:30] and technically we did ship it, but it’s in private beta at this point. It’s managed HIDS. Let’s start off with the first one being SSH session logging. SSH session logging pretty much does what you’d expect from it given the name. It lets you capture logs from SSH sessions. In the screenshot you’re seeing, I have Kibana open where I’m seeing my logs and I’m actually logging into a session and running a Python like a Django shell and touching users and everything. All of this, I’m running this through Aptible [00:25:00] SSH. All of this is being captured and brought into a log tree. This feature really works just like other log drains. Right now, before this feature, you could capture app and database logs. With this feature now you can also capture SSH logs as well. There’s a number of use cases for this. The main one, of course, is ensuring that access to your production data is fully audited, so if someone logs in and makes a number of changes, you want to make sure you know what changed. Maybe that’s a foreign auditing requirement and sometimes it’s just to [00:25:30] actually know what they did. Maybe there was a mistake that was made, you’re unsure what was done, and you need to look back and figure out, “Hey, what did we do that day when we SSH’d in and touched a bunch of stuff?” With SSH session logs, you’ll be able to do that. You’ll be able to audit activity in these sessions and see what’s happening. I should mention that this is typically a requirement if you’re trying to conform with protocols such as HITRUST, auditing access and production and in particular, access to production through management consoles is typically a requirement. If you’re trying to pursue any of these [00:26:00] audit frameworks, we of course encourage you to set up SSH session logging, so you’re all set and that will make it much easier for you to mark the requirement as done. The next change that we’re introducing, auditing-wise, is called activity reports. Activity reports are CSV downloads that essentially nest all the activity in your environments, whether it’s your activity, or even automation [00:26:30] from Enclave, for example, the backups that we do on a scheduled basis would be present in there. In this particular screenshot, what I’m pointing at is [inaudible 00:26:38] where you can see on this day Thomas deployed this app using a particular GitRef. These are available in your Dashboard and then generated on a weekly basis. You can find them under the activity reports tab in your Dashboard, so again, they’re posted on a weekly basis sometime in the morning on [00:27:00] Monday. You can just go a head and download them. Of course, you can just get them in bulk later. We keep them, you don’t have to download them every Monday. We keep them for you, of course. This is going to be useful for you for a variety of reasons. Again, mostly useful in an auditing context, but this lets you review your team activity. You want to know how many times you deployed, you want to know where deployment came from, who’s deploying the most, all of this will be available in activity reports. Maybe you’d like [00:27:30] to look at suspicious activity, like someone SSHing into production off hours when they’re not on call. You can do that with activity reports much more easily than you could before. Finally, that’s something you can share with your auditors as well. If you want to make sure you have an audit trail of all the changes that you’re making to your infrastructure, these activity reports will help you do that. Once again, in our experience, this is something that comes up fairly often [inaudible 00:27:54] protocols, particularly in ISO 27001, which we went through ourselves, [00:28:00] auditing and making sure we have an audit trail, the deployments that we performed was, indeed, a requirement as well. The last change that I want to talk about audit-wise, of course, is managed HIDS. This one’s a much bigger feature. We’ve worked a lot on this during this quarter. What it does, essentially, is it audits the docker hosts … First off, what does it mean? HIDS is Host-level Intrusion Detection, essentially making sure that at the host level, [00:28:30] you’re aware if someone’s trying to break in, or even succeeding, making sure we detect that. Essentially the goal of this is to audit the docker hosts that your containers are running on and generating reports on a weekly basis again, showing you and your auditors that this is … to essentially prove the integrity and demonstrate the work that’s being done to ensure the integrity of the various docker hosts that you’re deployed on. You can use these reports to share them with your auditors. Customers, if you have a customer that you’re [00:29:00] doing a customer assessment, a vendor assessment, asking you to prove that we have IDS running on our infrastructure. Or, again, if you’re going through HITRUST, this is a common requirement for HITRUST as well. If you’re being asked to have IDS infrastructure, this feature can help you satisfy that requirement. For me, the more practical basis, you can use it to audit what we’re doing as well. If you want to see what’s going on and what’s being done to [00:29:30] secure your infrastructure. With that, let me talk a little bit about how this feature actually works. We have all the docker hosts, and all of them … We have, of course, a number of containers, but we also have the OSSEC agent. OSSEC is an open-source host intrusion detection system, probably one of the more common [inaudible 00:29:50] ones. We have the agent running on these hosts, and reporting to an OSSEC server. Essentially that reporting is really about sending over a bunch [00:30:00] of logs, periodically sending checksums or files as they change and everything. The role of the agent is to collect information and send it. The agent doesn’t make any decisions as to, “Is this a problem or not?” Which is a good thing. We want the agent to do a minimal amount of work, so that it’s more difficult to compromise. The OSSEC server in turn, its role is to look at these events and decide if they’re worth alerting on. For example, if we’re getting logs from syslog, and [00:30:30] some are fully informational, that’s okay. But imagine someone’s logging in or someone’s running sudo, that’s an event that’s going to turn into an alert. All the alerts that we get from OSSEC, they get aggregated and go through our review process. We have three stations. The first one is an automated review process. This one really looks at the alert, and on the basis of that alert alone, decides, “Is this okay, or is this a security incident?” A potential security incident. A particular case [00:31:00] where it’s okay would be some scheduled task is running as expected. For example, we perform rootkit checks to scan for malware on a daily basis. That rootkit check running, that’s not an alert. It’s just good to know that it happened. Then, in the background, we’re going to double check and make sure that these are happening on a scheduled basis as expected, but in practice, it happening is no worse for the review. That’s going to get reviewed automatically. However, there are events that we can’t review automatically. For example, if someone’s logging into an instance over SSH, [00:31:30] it’s impossible to say if that’s legitimate of not just based on that piece of information. In this case, what we do is we go to our next layer or preview, which is [inaudible 00:31:38] review. In [inaudible 00:31:41] review, what we do is we aggregate events from the rest of our infrastructure to try and get a better picture of whether that event is legitimate of not. Going back to my example of an SSH login, whenever someone logs into an instance on our infrastructure, if they’re an Aptible operator, we actually ping them in stack [00:32:00] and ask them to confirm, “Was this you?” If it is, and they confirm this, that creates an audit record that we can then correlate with the events coming in from OSSEC. In this particular case, what we’d be doing is we’re looking here … We have Frank, for example, logged into this instance, but Frank also said it was him that logged into the instance. We can review that event. It’s okay as-is. However, maybe Frank said it was him. Maybe because it wasn’t even him, right? In [00:32:30] this case what we’ll do is that we’ll go to our next layer of review, which is manual review. This alerts our security team that there is an event worth investigating, and in this case, we actually have a human looking at it, which is the ultimate authority in this case in terms of deciding whether this event is security-relevant or not. If it is, what we’ll do, of course, is activate … If it is a potential intrusion that we’re seeing, we’ll activate our ancillary response process. Again, as we mentioned before, this one was designed in terms of the [00:33:00] ISO 27001 framework as well. In this case, we’ll have three stages where we assess the problem, contain it, and then eradicate it, and, of course, potential customer notifications as well. If there were affected customers and if applicable we’ll reach out to them as well to notify them. In either case, regardless of whether there was an incident to assess and evaluate, or if there wasn’t and it was just an informational message that our team was discarding, all of this actually gets aggregated into a report, [00:33:30] which is a weekly report that we provide, which essentially serves as a piece of audit evidence for all we’re doing. This report is really just a PDF download that you can get access to that essentially explains the whole process that you can share it with your auditors. In this document, it includes a sheet that tells you, “In this instance, here’s what happened on this instance during this reporting period,” and everything. If this feature sounds interesting to [00:34:00] you, particularly if you’re currently using another IDS system deployed on Enclave, maybe using Threat Stack or Alert Logic, or if you have new IDS needs, maybe a customer asks for a vendor assessment. You’re being asked about IDS or where you have an auditor asking you about IDS. In any of these cases, feel free to reach out to us for a demo, or at least to talk about this via e-channel. You can email email@example.com, or just submit a ticket in Zendesk. If you’re a premium support customer, you can also reach out to us in stack [00:34:30] as well. One thing I want to mention about this is we tried to really make this fully managed. Really, you don’t have to do anything. You get the reports, we do everything else for you. Compared to other IDS systems, this should require much less effort from you. We’re also pricing it in a way that we think will most likely be less expensive as well. Speaking of pricing, the first thing I want to say is, first off, OSSEC is deployed right now in all of our instances, regardless of whether anyone’s paying for it. [00:35:00] We’re not charging extra for security and host intrusion. We’re insuring the security of our instances, it’s our responsibility. We’re running OSSEC for everyone anyway, free of charge, including our existing plan. However, access to the evidence, all the audit reports, the PDF I showed you before, there’s a OSSEC version, all of this, however is a paid add-on. Right now, we’re planning for this to be $0.02 per hour per container of gigabyte, only for dedicated-tenancy containers. [00:35:30] That will come out to about 25% extra on top of your existing container pricing. If that sounds interesting to you, as I mentioned before, feel free to reach out to us via any of our existing support channels, and we’re happy to show you a demo and talk a little bit about this feature with you. With that, that about wraps it up for what’s new in terms of compliance this quarter. If you have any questions, we can take them now.
Frank Macreery: Yeah, we got a couple, Thomas. One is on [00:36:00] the most recent topic of managed HIDS. Dan asks if the OSSEC server we run is multi-tenant, or if it’s hosted within a customer-specific stack.
Thomas Orozco: It’s a good question. The OSSEC server that we’re operating is multi-tenant. It’s really just receiving events. We’ve disabled all of the … OSSEC has a number of features. It can send things from the server back to the agents, we’ve disabled all of this. Essentially, a write-only destination as [00:36:30] exists a few other in our infrastructure. For example, our log aggregation as well.
Frank Macreery: Cool. The other question goes back to SSH session logging. We had a question about a log drain currently capturing only app and database logs. How would they go about adding SSH session logging?
Thomas Orozco: The easiest way to do this is just to go into your Dashboard, create a new log drain, and only drain SSH sessions. [00:37:00] You can reuse the same destination, of course, or use a different one. However, if you’d like, we can also enable this for you. If you have a particular log drain that you don’t want to add a second one, you hate [inaudible 00:37:11] and you don’t want to have two log drains logging partial logs, each of them, you can just reach out to support and we’ll enable this for you on the existing log drain without you needing to [inaudible 00:37:21] an additional one.
Frank Macreery: Great. Thank you so much, Thomas. To the audience, if anyone has additional questions [00:37:30] that they think of as the webinar continues, you can still ask them in the Q&A and we’ll get answers back to you one way or another. Without further ado, we’ll hand it over to Skylar, who’s going to tell us a bit about Gridiron and the recent improvements there.
Skylar Anderson: Thanks, Frank and Thomas. Again, my name’s Skylar Anderson. I’m the lead front-end engineer here at Gridiron working primarily on Gridiron. I’m going to do a review over the past quarter and talk about some [00:38:00] improvements and new features. As a quick overview, Gridiron, we like to think of it as the fastest and easiest way to manage your security management system. It’s like how QuickBooks is to accounting, we are to information security management. New this quarter, we focused primarily on four key areas. The first is customer and vendor management. You can now add and remove customers and vendors and the agreements associated with them. We’ve [00:38:30] improved your ability to be prepared for an audit with some new Gridiron reports I’m excited to share. We’ve improved your ability to manage assets and the components within your information security management system. Finally, we’ve made significant enhancements to the Gridiron Risk Model. Getting started with customer and vendor management. Basically it’s a single place where you can manage and track both the upstream and downstream contracts you have with your customers and vendors [00:39:00] in a dingle Gridiron Dashboard. Again, you can track all of your customers, all of your vendors, but in addition, you can track the agreements you have with each of those and the contingencies between those agreements. You can also upload and download the actual agreement documents, so if you have a service agreement or a statement of work, or a VAA with your vendors and customers, you can upload them to this Dashboard so you can quickly distribute to your team where everyone knows where to go to find those documents. [00:39:30] Finally, we’ve integrated the vendor management portion with the existing asset management functionality in Gridiron. As you’re adding and removing systems and components to your existing security management system, we’re also pre populating the vendors in your vendor management Dashboard, so you’re not left on your own to think of every vendor you’re using. In fact, any existing Gridiron customer today will see a list of existing vendors that we’ve collected for them. I’m going to quickly show a demo of these Dashboards. Going [00:40:00] to the Gridiron Dashboard under the tool section, we see two new tools, customer and vendor management. Customer management starts with just an indexing of all of our customers I may have added. I quickly click into any one of these and see … For example, I have a VAA statement or a service agreement. I can see that the statement of work is contingent on a service agreement and download any of these documents quickly. Vendor management works very similar. There’s an indexing of all the vendors that I have a relationship with. [00:40:30] These were pre populated based off of the systems and components I’ve already said I’m using, so if I click into Aptible, for example, I have some apps to put on Enclave. I can see not just the agreements that I have with Aptible, but I can also see the systems I actually have and what’s defining my relationship with that, what responsibility they have over these systems. I know I’m logging two Elasticsearch on Enclave and I have a database and apps hosted on Enclave. [00:41:00] Moving on to audit preparedness, we are launching four new reports on Gridiron. These reports will go a long way to, number one, improving your own internal auditing capability. For example, if you’re doing tabletop exercised for business continuity training. We have a report for that. If you just need to review your asset inventory and see where sensitive data is deployed, we have a report for that. These reports also work [00:41:30] great for passing customer audits. We have an export feature for all of them that you can generate, a CSV that you can email directly to a customer. All these reports were also inspired by our own ISO 27001 certification. These were reports that we had to generate as a part of our certification package, so we figured we’d just build that tooling directly into Gridiron, allow us to use the tool ourselves to pass our own audit, and now we’re circulating that feature back down to our customers. These are the four reports we’re going to show. [00:42:00] The first is a training history report, which is just an export of your workforce and all the training modules they’ve completed and the dates they completed them. Second is an asset inventory report, which is a comprehensive listing of all the components making up your security management systems. These are things like the apps you have deployed, what networks they’re deployed on, what are the hosting platforms hosting those apps, what data storage do they depend on, what sort of devices you’re using, that’s all included in a single report that you can export [00:42:30] as a CSV. Third is a business continuity report. This basically allows you to execute your business continuity plan more quickly by showing you prioritized data related to all of your security management components. Specifically for recovering more quickly from an incident, so you can see your apps and databases, determine the priority of recovery based off, for example, the maximum tolerable downtime, or the business criticality of each of those components. Finally is the audit log report. [00:43:00] This is a view into all of the components of your security management system, showing you quickly where they’re logging to. You’re never having to guess or dig through some of the more nested streams in Gridiron, you can go to a single place to determine which components are logging where. I’ll quickly demo each of these reports and kind of walk you through them. The first is the asset inventory report. Again, this is just a high-level view of all the components making up your security management system. I can see things like [00:43:30] apps and databases and networks. I can see the hosts that I’m deployed to, the data stores and the logging destinations. I can see the third-party services that I’m depending on and the level of data sensitivity for all of those services. There’s many more attributes that we collect about all of these, so if you actually do the CSV export and open this up, there’s a number of attributes you’ll see. The second report is the business continuity report, and as I mentioned, this is useful for determining the priority of responding to incidents, [00:44:00] so we’re sorting these based off priority, basically, which is … First, the business criticality followed by the maximum tolerable downtime. And then for databases and data stores, we have some additional information about recovery time objectives, recovery point objectives, if backups are performed, if they’re automatic or manual, where they’re backed up to, if data’s rotated in and out or if it’s persistent indefinitely. The third report is the audit logging report. This is a simple breakdown of I have a number of components [00:44:30] and I need to know where they’re logging to. I have a couple of apps and they’re logging to an ELK Stack on Enclave versus this image processor app is logging to Papertrail. Finally, the training records report is basically just an index of all the training completion for all the users in our organization, along with which course and when it was completed, and again, I can download this and quickly view a CSV if I need to. [00:45:00] The really important thing about these reports is that you will most definitely have to prepare them as a part of your certification package if you’re going through SOC 2 or ISO 27001 audit. We basically built these features to serve our own need and we’re very excited to share them with our customers. Third, we made significant improvements to asset management on Gridiron. Asset management basically allows you to quickly and easily track [00:45:30] all of your assets or components making up your security management system. Again, assets or components are things like apps or databases, but then also services you may be using that those apps and databases depend on. Enclave is a service, different services within AWS, like S3 or EC2, those are all services. We make it easy for you to track and know what sort of data is deployed on all those different types of systems, [00:46:00] and how they all depend on each other. We’ve expanded your ability to track components within your asset management system by adding things now like networks, devices, and third-party services. We’ve also added many new pre-seeded third-party systems and backends for you to choose from, so for existing Gridiron customers, you might have noticed that there’s no easy way to just add your own system. Instead, you’re always picking from a list. That’s a library of systems that we maintain, [00:46:30] because it allows us to build our own risk profile and our own set of default data based off our experience and expertise and manage that information for you. What that ends up doing is creating a very easy-to-use experience for a user, because we’re providing much of that default data about different systems for you. I can show you what that looks like. Finally, as I mentioned earlier, we’ve integrated our asset management capabilities with vendor management. As you’re adding services or configuring your apps to use [00:47:00] services, we’re determining the vendor and creating that for you in your vendor management Dashboard. I’ll quickly demo what some of these new features look like. Going back to the Gridiron Dashboard, if I scroll down under our security program, I can see I can configure multiple things like apps and databases and networks, backends, third-party web services and devices. Devices is a new feature where basically you can tell us about the data and how you’re hardening all the devices [00:47:30] that your workforce depend on. If your workforce is using laptops, or, for example, if they might be using removable media, and for some reason if they’re storing encrypted PHI on removable media or if they’re performing routine security reviews, like auditing laptops using MDM, you can configure all that here. Similarly with third-party web services, this is showing our indexing of available systems that you can configure on Gridiron. We’ve added the ability now for each [00:48:00] of the systems to also have a relationship with existing components for existing apps. If I switch over to the app screen, for example, and make a change to our Patient Portal app, I can now indicate that there are third-party services that this app depends on. This helps in business continuity and incident response planning where you can actually see the dependencies between all of your components and know, for example, if Amazon Simple Web Service is down, which apps are affected. [00:48:30] That, basically, concludes the additions to the asset management features of Dashboard. I’m next going to talk about enhancements we’ve made to the Gridiron Risk Model. The Gridiron Risk Model is a concept we covered in the January 2017 webinar series, and if you haven’t seen that webinar and you’re interested in Gridiron, I would encourage you to look at it. We covered the [00:49:00] abstract principles behind how Gridiron works, and how we’re able to analyze and quantify risk around an organization. That basically allows us to perform a deep risk analysis on your organization. What’s new this quarter is we’ve taken that idea of the Gridiron Risk Model and pulled it apart and enabled it to be applied to any piece of your information security management system. Previously, [00:49:30] the scope of the Risk Model was generic enough to apply to just your organization. We’ve now updated it to be applicable to all parts of your security management system, starting first with apps and the various app types. Examples of that would be if you have a Web App versus a single page application. Those have two very distinct risk profiles, and we have built those default risk profiles and are attaching them to all the assets in your asset [00:50:00] inventory, and then exposing some UI to allow you to configure that risk profile on top, so you can ultimately quantify risk specific to each component in your asset inventory. Again, to review, we’re pulling the Gridiron Risk Model out of just the organization and making it applicable to all things, things like apps, databases, devices, and networks. We’re expanding it further even to physical locations or vendors and hosting providers, but also the third-party web services. [00:50:30] Some of this is in progress, some of this is complete. I’m going to show you the portions that are complete with a quick demo. Going back into Gridiron, if I click onto the security controls tab, this is new for any existing Gridiron user. Previously, we only covered these organizational control categories. What’s new here is we now have a new set of control categories specific to the app instances I’ve created. Again, each of these [00:51:00] profiles is unique to the app type. A single page application has a much smaller surface area for risk than a web app. The web app has the largest profile, because it has both a front end that delivers HTML, but then a dynamic back end that’s processing user input or uploads or providing authentication. This Patient Portal app is a web app, and I can click and view the assessment here. The parts of the Risk Model that we expose are just the predisposing conditions and the security controls. [00:51:30] The predisposing conditions are facts about you that expose you to risk. The security controls are the things you’re doing to mitigate them, and we now expose some of the details of our model in this UI. I can see if this app connects to a database, I’m exposing a number of threats relevant just to this app. Clicking to security controls, I can see the opposite. I can see the things I can do to harden this app and what, for each of these controls, could I potentially be mitigating. [00:52:00] We make it easy to sort these by priority, so if you were implementing a security plan of action, you could look at all of your apps and understand what features are missing and which might be of higher priority because they mitigate the most risk. That concludes my demo of new features for Gridiron. I’m open to any questions or comments.
Frank Macreery: Thanks, Skylar. We [00:52:30] did have one question about contract management. The question is about what are some examples of agreements that should be tracked in Gridiron versus agreements that would not be tracked in Gridiron, and are there limits to what one should be using contract and customer management for?
Skylar Anderson: That’s a great question. I don’t think there are limits, but there are some things you definitely want to keep in there. This was a feature also [00:53:00] borne out of our ISO 27001 audit process, where one of the documents you’ll need to include in your certification package is an index of all of your legal and regulatory requirements. This is basically all the agreements or contracts that you’re obligated to adhere to. You’ll need to provide documentation and an audit trail of those agreements. We’re building the tooling to build that report for you, and that’s a coming feature, that report. Any sort of contract that you may have, either with a vendor or a customer that you are obliged to follow [00:53:30] should be uploaded. The most common will probably be the BAA, particularly with vendors, if you’re using AWS or Aptible. We do plan to expand this feature to automatically upload some of that documentation as a part of our automating the vendor management process with the existing assets we know about. There are no limits. If you have an agreement that you’d like to track, you’re welcome to track it. It will be included in the legal agreements report I mentioned [00:54:00] once we launch that feature, though.
Frank Macreery: Cool, thanks Skylar. Dan asked another question. I can grab this one. Dan asked, “I understand that Gridiron aided Aptible to achieve ISO 27001, but does the product itself meet some ISMS standard, and are there such standards?” Yes, and this is actually … If you look at the ISO 27001 certification … If you just go to aptable.com/resources, we have the certificate up there. Part [00:54:30] of ISO 27001 is defining the scope of the audit, and in this particular case, the scope is … The information systems and networks, personnel, policies, and operational procedures used in the direct provision of Aptible, Enclave, and Gridiron. It does cover both products. In particular, because ISO 27001 is more broadly focused on organizational processes [00:55:00] than some other security frameworks, this covers everything, not only just the technical safeguards that are implemented, but all of our organizational procedures for securing both Enclave and Gridiron. This is a great question, and the answer is, “Yeah. Gridiron is under the scope of our ISO 27001 certification.” There’s one more question that Damon just asked. [00:55:30] Damon asks, “In Gridiron and Enclave, is it all customer input to populate, or are there areas where a customer can track action items, such as attacks?” I think to the latter question, Skylar, maybe you can talk about the incident response tool in Gridiron?
Skylar Anderson: Yeah. I think he’s getting on two [00:56:00] things I’ll touch on. Generally speaking, as a part of the Gridiron Risk Model, we do track attacks, and it’s how we’re able to quantify risk and prioritize the controls that you should implement. Those are controlled by Aptible. We don’t allow you to add specific threats. However, if you experience a threat and you need to respond to it, then, as Frank said, we have the incident response tool that lets you manage that workflow of you detected [00:56:30] something happened and you need to organize your team around how you’re going to respond to that particular attack. I’m not sure which Damon is referring to, but those are two different capabilities within Aptible. In general, the Risk Model is configured by Aptible with some features of the model exposed to the user. We’re experimenting with adding more customization, but general things like attacks and vulnerabilities, we maintain that mapping of how those relate to actual components in your [00:57:00] security management system.
Frank Macreery: Awesome. Thanks very much, Skylar. Damon, if you have any further questions about that, feel free to just follow up and we can answer directly if we didn’t get the correct answer to the question that you were trying to ask, there. That’s it for this October 2017 Aptible update webinar series. [00:57:30] Our next webinar will be January 25, at 11am Pacific, 2pm Eastern, if you want to mark your calendars. We’re going to post links to the video and slides on our blog and on aptable.com/resources, and we’ll also send a follow-up email to all attendees with those links and a link to register for the next webinar. Until then, thanks so much for joining us. Hope this was useful, and [00:58:00] see you next time. Thanks.
Defense in Brief
Sign up to get the best in security and compliance delivered monthly.
From the Blog
Recap: Aptible January 2018 Quarterly Product Update Webinar
In case you missed it, you can watch a recording of our January webinar below. You can also grab the transcript and the slide deck in our resources section. We provide a full recap of the event in this post.Read more
Meltdown and Spectre are Critical Vulnerabilities for Cloud Infrastructure. Here’s How the Aptible Security Team Responded
Meltdown is a critically important vulnerability for any PaaS provider. This post details why that's the case, how Aptible responded to Meltdown and, more broadly, how Enclave is architected to protect against these sort of vulnerabilities by default.Read more