Last week we hosted a webinar covering everything a SaaS company needs to know about complying with GDPR. To be a bit more precise, we covered everything we could during a webinar running a little more than an hour. The GDPR is a complex and vast set of regulation that impacts much of how modern SaaS companies operate. Covering it all in a short webinar was tough.
That said, we’ve distilled the most salient notes in the blog post below. Don’t hesitate to skip ahead and watch the actual recording or grab the transcript and slides, all of which you’ll find in our resources section.
And if you still have questions, please join us in our GDPR Slack community, where 300 of us are discussing how to comply with GDPR.
But first: Why should you listen to us?
Aptible has had the opportunity to help hundreds of SaaS companies build robust data protection programs that pass compliance audits, and we’ve worked with many of them to ensure they comply with GDPR.
Our team is made up of software developers, attorneys, and privacy experts who’ve achieved CIPP/E certification (that’s Certified Information Privacy Professional/Europe). In many ways, there’s a lot that’s “TBD” with GDPR, but we’ve done everything we can to ensure we’re ready to help SaaS companies comply.
What is the GDPR?
GDPR stands for the General Data Protection Regulation. It is a federal law that regulates the personal data and privacy of European Union (EU) citizens.
GDPR consists of 99 Articles and 173 Recitals. The Articles are more prescriptive and the Recitals tend to provide more context to an issue and include examples.
The overarching goals of the GDPR are to:
Give control back to EU citizens and residents over their personal data, promote human rights such as the right to privacy
Simplify and harmonize the regulatory environment for international business by unifying regulation within the EU
Address the export of personal data outside the EU
GDPR applies to all member states in the European Union. Member states (or countries) cannot have laws that are less protective than GDPR. They can implement laws that are more protective. GDPR sets the baseline for privacy and data protection.
Is GDPR new?
The foundation of viewing data protection as a component of human rights goes back at least 70 years, to just after WWII. GDPR is very similar to the EU’s 1995 Data Protection Directive, so there is a recent history of guidance to rely on that helps inform our understanding of GDPR. The regulatory environment will continue to evolve as the EU replaces the ePrivacy Directive with an EU Regulation, binding in all member states.
Which companies are “in scope” for GDPR?
Under Articles 2 and 3, GDPR separates scope along two dimensions:
Material Scope - Article 2 states that GDPR applies to the processing of personal data, or individually identifiable data, broadly construed. Not just product or customer data, but also your HR processes, marketing/sales, etc.
Territorial Scope - Article 3 states that GDPR applies to any organization established in the EU, targeting the EU markets, or profiling/monitoring EU data subjects.
But in the broadest terms, if the “go to market” for your business has anything to do with processing personal data for natural people located in Europe, you should be thinking about GDPR. Of course, there are exceptions and nuances, so it’s always best to consult with your own lawyer to determine applicability.
Practical Takeaways for SaaS Companies
During the webinar, we focused on four key areas of a SaaS business that GDPR impacts. We flagged key issues that SaaS companies will need to address in order to ensure compliance with GDPR. The key takeaways are summarized below.
Marketing and Sales
Most US-based growth teams have never had a specific regulatory security/privacy regime to deal with, so GDPR is a game-changer. If you know you have EU customers (B2B or B2C) in your pipeline, GDPR and the ePrivacy Directive make direct marketing harder. Some of the fundamental tactics of modern growth marketing, like email capture, email marketing, analytics, and paid advertising are complicated by the need to obtain consent before you use an EU data subjects’ email or track them.
What this means is that you need to maintain a source of truth containing information about what you can and can’t do with your users’ data. For each user, have you obtained consent to send them marketing emails? How about consent to track them? When did you obtain this consent?
For most companies, this will likely take the form of a master suppression list, at least to start. But down the road, we suspect that tracking this information will be better accomplished with a fully customizable CRM-like software, or perhaps an integration with an existing CRM like Salesforce.
Marketing and sales teams are notorious for using scores of tools, enabling data flows between systems and vendors without so much as a second thought. But this mode of operation needs to stop, or at least slow down just a bit. Complying with GDPR means that every vendor who you work with–from chat apps such as Intercom and Drift to analytics tools such as Mixpanel–need to be vetted for GDPR compliance. In many cases, this will mean signing some sort of data protection agreement with the vendor. Without such an agreement, sending data across to a vendor may constitute a reportable breach!
(Side note: a formal vendor management process would help here, which is something that our product Gridiron can help you to implement. Get in touch if you’d like to know more.)
Product Design and Engineering
Engineering teams are generally used to thinking about customer data as potentially sensitive, which is a good start. However, under GDPR it’s up to engineering to ensure that you either provide the capability to respond to data subject requests (in other words, requests like “remove my data from your systems”) or the tools to allow your customers to handle these requests.
Whether you need to handle the requests yourself or provide the tools to your customers breaks down along the lines of data controller vs. data processor, an important distinction in GDPR. The distinction is decided based on how the company uses data:
Data controllers decide on what data should be collected and what to do with it
Data processors just provide tools to help data controllers make use of the data they collect
Many SaaS companies like Slack and GitHub are arguing they are purely processors. This makes sense, given that the requirements placed on processors are substantially less onerous than those placed on controllers.
Realistically, however, most SaaS companies are going to be both a processor and a controller, especially if they are B2B. B2B SaaS companies give their customers tools to process data (thus they are processors); they also collect and use data about their customers (thus they are also a controller). Accordingly, we expect to see pushback from the EU on large SaaS companies who argue that they are data processors, only.
Data Protection by Design and Default
Regardless of whether you are a data controller or data processor, engineering teams need to think broadly about how they will protect their customers’ data, in order to maintain GDPR compliance. There’s significant surface area to protect, and the best engineering teams will implement processes to allow them to consistently enhance security.
The tl;dr on data protection is to take control of the data your organization collects and uses, wherever it goes. Personal data can leak in unexpected ways, such as into logs (which can be sent to logging providers like Papertrail), events (which can be sent to monitoring tools such as New Relic), or errors (which can be sent to error tracking tools such as Sentry).
Some of the keys to compliance here are:
Being able to provide design specs and other requirements documentation that show that security and data protection were taken into account
Ensuring that all vendors are vetted for data protection capabilities and, in nearly all cases, data protection agreements are signed
(Here again, our product Gridiron can help, with both ensuring you are implementing appropriate security controls and, of course, maintaining a vendor management program. Get in touch to learn more.)
Support and Customer Success
GDPR and the ePrivacy Directive restrict how support and success (upsell, expansion, retention, etc.) teams interact with customers. You can’t just send a user an NPS survey, for example. You still have to collect their consent–something that you may want to ask for at some point during onboarding or perhaps after they enter their credit card number.
You can of course email customers for anything that is critical to the functioning of your software (in other words, for the fulfilment of your contract with that customer). Billing emails, password resets, support requests and the like are all probably strictly necessary for you to fulfill your contract with them.
But as soon as emails trend towards marketing, such as a re-engagement, a cross-sell, or upsell, these will be treated as marketing and will be subjected to the same standards of consent. This probably goes for onboarding emails too, though the line is fuzzy. In any case, it’s important to let your customer know what to expect ahead of time and provide opt-out opportunities, both up front and within every email.
This also applies to product research emails, such as surveys or NPS scores. Sending these emails benefits you more than the customer, and so it is essentially treated as marketing and subject to the same constraints.
Hiring and Human Resources
Certainly HR teams already worry about keeping certain employee data private, such as salary or performance reviews. Under GDPR, it’s important to protect all personal data about EU employees, prospects, and recruits. The entire vendor stack: your applicant tracking system and your productivity tools, must be vetted for GDPR compliance, meaning putting in place a data protection agreement. (Here again, a real vendor management process would help.)
Pre hire, some EU jurisdictions make it illegal to even request a background check. Make sure you’re clear on what you can and cannot ask from applicants before asking.
Post hire it’s critical to have an employment (or consulting) agreement in place, because that agreement allows you to process the employee’s personal data.
Any company that is subject to GDPR should be aware of the potential employee embarrassment that could arise from BYOD policies. If there’s a breach of any kind, there’s serious potential consequences for failing to respond appropriately. Responding appropriately means that all employee devices that have contained work-related data will be confiscated and reviewed, potentially disclosing embarrassing personal information.
Caution: Workforce Monitoring
Any workforce monitoring that happens can trigger serious consequences under GDPR. If you have super admin privileges, reading someone’s email “because you can” or flipping on web monitoring (intentionally or not) is looked upon particularly unfavorably.
Get the Webinar
There’s much more to learn by watching the webinar, so grab it in our resources section.
If you still have questions, please join us in our open GDPR Slack community.
As we mentioned just a few days ago, data security is increasingly top of mind for every business, but especially B2B SaaS companies. Data breaches are becoming more common and present an existential threat: losing control over sensitive personal information information can result in lost reputation, churn, class action lawsuits, fines from regulators, or literally closing up shop.
I’m very happy to announce that Aptible has achieved HITRUST CSF Certification for Enclave and Gridiron. This post shares a bit more about what this means and how you can think about your own path to certification.
What is the HITRUST CSF?
In healthcare, HIPAA is the dominant regulatory framework, but it has two basic shortcomings:
Lack of standardization: Because HIPAA regulates a large and extremely diverse set of businesses (perhaps close to 1M covered entities and business associates nationwide), it necessarily leaves a lot of room for interpretation. As just one example of many, HIPAA doesn’t address multi-factor authentication at all, except through general requirements to appropriately safeguard data. SaaS companies are routinely asked for independent verification that they exceed HIPAA requirements and implement best practices and industry standards for security and privacy management.
Lack of independent assurance: SaaS companies are often asked for independent validation that they meet HIPAA requirements, but there is no official HIPAA certification or validation. HHS has an auditing program, but you cannot self-select into it.
Enter the HITRUST Alliance and the HITRUST Services Corp. HITRUST Alliance is a non-profit that develops a Common Security Framework (CSF) based on ISO/IEC 27001 that integrates HIPAA, HITECH, and a variety of other state, local, and industry frameworks and best practices. HITRUST Services Corporation is a for-profit that works with independent CSF assessors to validate implementation and efficacy of the CSF.
HITRUST has three levels of assurance for the CSF, each of which correspond to a report:
Self-Assessment is what it means. HITRUST will QA the report, but all of the attestations are supplied by you.
Validated Assessment is where you work with an authorized CSF Assessor following the CSF Assessment Methodology.
Validated Assessment with Certification is available when demonstrate certain levels of maturity for each in-scope control.
What does HITRUST CSF Certification mean?
To earn certification, you have to demonstrate to your assessor’s satisfaction that all of your required controls have met certain maturity levels. This HITRUST CSF Certification implements this through a scoring system. The full HITRUST CSF Control Maturity Model is described starting on page 9 of HITRUST’s “Risk Analysis Guide for HITRUST Organizations”. In particular, see the scoring example starting on page 16.
I’ll summarize and provide another example here.
First, you work with your assessor to determine which controls are in scope, based on certain risk factors that HITRUST deems relevant, such as how much HIPAA PHI you process, your organization size, etc. (see “Assessment Scoping” in CSF Assessment Methodology). Controls are organized across domains such as access control, asset management, and risk management.
Each in-scope control has five maturity levels, organized progressively. You receive a score (25% intervals between 0-100%, non-compliant to fully compliant) for each maturity level based on whether you have complied with the control’s requirements for that maturity level. Your overall maturity score is the sum of the weighted scores.
I’ll use the same category of example as the HITRUST Risk Analysis Guide, device encryption, adapted slightly for startups.
|Maturity Level||Summary||Weight||Scoring Example|
|Policy||Policies for the control are in place, managed, and communicated to those affected or responsible.||25%||You have an official policy that says you will encrypt all laptop and workstation filesystems with strong crypto, and enforce it with mobile device management. The policy is signed by management and has been distributed to the specific individuals you have assigned responsibility for managing security, and laptops/workstations in particular.
Score: Fully Compliant, 100
|Procedures||Procedures for the control are in place, current, communicated to those affected or responsible, etc.||25%||You have internal procedures documenting how to configure JAMF Now, your MDM provider. JAMF only works with Mac devices though, and you have one old Windows server running some ancient on-prem software you really need for processing PHI. It can’t be enrolled in JAMF.
Score: Mostly Compliant, 75
|Measured||Control tests, self-assessments and audits are performed; metrics are collected, etc.||15%|| You check for MDM enrollment, status sometimes, but not on a regular basis.
Score: Somewhat Compliant, 25
|Managed||Controls are adjusted and matured over time.||10%|| Windows MDM is just something you haven’t gotten around to yet. You don’t really use any metrics from JAMF to make decisions or improve security.
Score: Not Compliant, 0
In this example, you’d score:
(100 for policy)(.25)
+ (75 for procedures)(.25)
+ (75 for Implementation)(.25)
+ (25 for measurement)(.15)
+ (0 for management)(.10)
You’d then convert that maturity score to a rating scale for CSF certification (see p. 21 of the HITRUST Risk Analysis Guide). In this example, a maturity score of 66.25 would convert to a Level 3+ maturity rating.
To obtain certification, you must attain a Level 3+ or 3 with a corrective action plan for each required CSF control.
How can you get HITRUST CSF Certified?
First things first, you need to have some kind of formal security management function in place, or a plan for developing one. You are going to need to designate security responsibilities, establish formal policies and do formal risk analysis, train your workforce on various security and compliance issues relevant to their roles, conduct regular security management tasks and checks, such as scanning, pen testing, etc.
Note that the rating scale is weighted so that if you hit 100%, fully compliant for policies, procedures, and implementation on a control, you will score 3+. In other words, if you nail policies, procedures, and implementation across the board, you have a path to certification and give yourself a strong foundation for improving your security posture over time by layering in more monitoring and management.
A standard certification strategy usually starts with a CSF Self-Assessment. Given that we dogfood our own Gridiron platform for HIPAA, GDPR, ISO 27001, and SOC 2 compliance, we had a major head start on the HITRUST CSF requirements.
Things to be aware of as you budget and assess whether certification is feasible:
Each HITRUST report has a fee ($3750-7500 each depending on your organization)
You must purchase a HITRUST MyCSF subscription (starts at $10k/year)
Your assessor will have their own fee schedule (likely the bulk of your cash costs - shop around as prices vary widely)
A standard assessment strategy would be to start with a facilitated self-assessment, to use as a gap analysis. You’d hire your assessor or another facilitator, purchase a subscription to MyCSF, and purchase a self-assessment report. When you are ready to proceed to a validated assessment with certification, you’d purchase another validated assessment report and work with your assessor as described in the section above.
How does the Enclave + Gridiron HITRUST CSF Certification help you?
This whole Internet/cloud/software-eating-the-world thing doesn’t work without trust. Our mission at Aptible is to help developer teams protect sensitive data. Adding HITRUST CSF Certification to our assurance programs for ISO 27001 certification, SOC 2 Type 2 auditing, HIPAA and GDPR/Privacy Shield makes Aptible Enclave one of the most heavily audited container platforms anywhere. Compliance is not security, and confusing the two is dangerous, but independent verification of security management in the form of certifications and audits does help build trust, and is increasingly a critical requirement for B2B buyers.
If you are a B2B SaaS company, using Enclave is the fastest way to fly through vendor security assessment, risk questionnaires, and other steps in the B2B sales process. Your customers will accept our certifications as evidence that your Enclave architecture is managed according to the most stringent security best practices.
If you are interested in HITRUST Inheritance for Enclave, please let us know.
Gridiron is a SaaS platform for security management. Customers use it to build and manage security programs that meet and exceed protocols like HIPAA, GDPR, SOC 2, and ISO 27001. The HITRUST CSF is separately licensed by HITRUST and is not available in Gridiron by default. Please contact us if you would like to use the HITRUST CSF in Gridiron.
How can I get a copy of the Enclave + Gridiron CSF Validated Assessment Report?
You can view Aptible’s standalone HITRUST CSF certification letter for Enclave and Gridiron here.
Because the full Validated Assessment Report contains sensitive information, we cannot share it publicly. We are however excited to share it with customers and partners. If you’d like to get a copy of our report, or if you’d like to learn more about the HITRUST CSF, please let us know.
Data security is increasingly top of mind for every business, but especially B2B SaaS companies. Data breaches are becoming more common and present an existential threat: losing control over sensitive personal information information can result in lost reputation, churn, class action lawsuits, fines from regulators, or literally closing up shop.
Your customers and partners demand assurances that the data you process on their behalf is protected. This is why standards like the the American Institute of Certified Public Accountants (AICPA) System and Organization Controls (“SOC” for short) for Service Organization reports have become popular in the last few years. A SOC report is completed by an independent third-party CPA auditor and provides insight into how a service organization (such as a cloud vendor) achieves key security and compliance objectives.
Aptible has achieved SOC 2 Type 2 compliance for the security and availability Trust Service Principles. This post shares a bit more about what this means and why this type of compliance is so valuable to B2B SaaS companies in specific. We’ll also share how you can start building a security program that meets SOC 2 requirements and is audit-ready.
(If you’re a customer or partner, and you want to get a copy of Aptible’s SOC 2 Type 2 report, skip ahead.)
What is SOC 2?
SOC 2 is a widely-used framework for building trust between vendors (called “service organizations”) and customers (called “user entities”).
CPAs have been doing audits relating to controls at service organization relevant to user entities’ internal control over financial reporting for decades, all the way back to a standard called SAP No. 29 in the 1950s. In 1992, a standard called SAS 70 introduced the concept of service organizations, which was used for years and gained importance post-Enron and post-Sarbanes Oxley. These standards still focused on internal control over financial reporting, however, not security. With the rise of cloud computing, the AICPA saw the need for a security-specific framework, and in 2010 introduced their new Statement on Standards for Attestation Engagements No. 16 (SSAE 16). SSAE 16 introduced SOC 1, SOC 2 and SOC 3, with SOC 1 replacing SAS 70.1
Today, SOC 2 Type 2 reports are one of the most requested forms of assurance from large B2B customers. Why is that?
SOC 2 defines five Trust Service Principles (security, availability, confidentiality, processing integrity, and privacy) and criteria (called the Trust Services Criteria) for meeting them. As an organization, you select controls to ensure you meet the criteria.
Do you remember the Choose Your Own Adventure Series?
SOC 2 is Choose Your Own Trust Service Principles2 and controls. You pick which TSPs you want to be audited on, and which controls you select.
Why are those the most popular? Why not the privacy TSP?
In a nutshell, the security TSP is the big lift. The criteria for the Trust Services Principles are broken into two categories: a set of criteria common to all five of the trust service categories (called the “common criteria”); and additional criteria specific to the availability, processing integrity, confidentiality, and privacy TSPs.
The common criteria cover key concepts that affect all of the TSPs and criteria, like establishing a control environment, communication, risk assessment, and monitoring. So, if you only do the security TSP, you do the common criteria and are done. If you only do availability, you have to do all of the common criteria, plus three additional criteria. If you do security and availability, you have to do the same work: all of the common criteria, plus the three additional availability criteria. With the new 2017 Trust Services Criteria, confidentiality (which used to be 8 additional criteria in the 2016 version) is wrapped into the common criteria and slimmed down to two additional criteria.
So security is the most popular TSP because everyone has to do it and it gets at the heart of your security management program. Availability and confidentiality are extra work, but not that much more.3 Processing integrity is six additional criteria (five in the new 2017 TSCs) and may become more popular in the future, although at Aptible we don’t see much demand for it yet. Privacy is 20 extra criteria (18 in the new 2017 TSCs), and often entities have HIPAA or GDPR efforts that are redundant, so customers rarely demand it.
We highly recommend buying the Trust Services Criteria3 and SOC 2 Guide. Note the SOC 2 guide is the new, shiny 2018 edition (and works with the upcoming 2017 TSCs), but the TSCs are the 2016 version, which expires at the end of March. You can download a mapping of the extant (still in effect) 2016 TSCs to the new 2017 ones from the AICPA.
That explains SOC 2, but what is a Type 2 report and why is it so popular?
SOC 2 (and SOC 1) reports come in two flavors, Type 1 and Type 2. (These are also sometimes called Type I and Type II, but the AICPA SOC 2 Guide uses Arabic numerals, so I will here. I don’t think it matters.)
A Type 1 report is a point-in-time snapshot where a CPA looks at management’s description of the service organization’s system (e.g. your security management program) and renders an opinion on 1) whether that description is fairly presented, and 2) whether the controls you have in place are suitably designed to meet your control objectives. Type 1 reports are useful if you want to get your auditor familiar with your chosen controls, or if your system or control scheme has changed significantly.
A Type 2 is the good stuff your customers want: It includes the Type 1 subject matter plus an opinion on the operating effectiveness of the controls in place over a specific period (called a “review period” - usually 6 or 12 months). The Type 2 report also contains details about how the auditor examine each control and what they tested. This level of granularity, along with SOC 2’s usability for any vertical, is why the framework is so popular.
By way of contrast, an ISO 27001 certification (Aptible’s is here represents strong adherence to a specific set of controls, but doesn’t have any granularity as to how specific control objectives are achieved, or whether those controls are operating effectively. Many B2B buyers will accept both in lieu of a security assessment questionnaire, but some prefer SOC 2.
The AICPA’s new SOC for Cybersecurity framework will have both a static set of controls (like ISO 27001) and the SOC 2 auditing methodology, and will probably be popular as well when the SOC for Cybersecurity Guide is released later this year.
How can you complete your own SOC 2 report?
If you run architecture on Enclave, our AWS-based Docker container orchestration platform, you can inherit our SOC 2 report through what the AICPA calls the carve-out method.
We also offer Gridiron, a security management SaaS platform for helping you stand up and run a security management program that meets stringent criteria. You can use Gridiron with several protocols, including HIPAA, ISO 27001, SOC 2, and GDPR.
For SOC 2 specifically, Gridiron onboarding replaces much of the gap and readiness work that you would do with consultants, in spreadsheets and word processing documents, and leaves you with a source of truth for security management data that makes auditing easy.
How can I get a copy of Aptible’s SOC 2 Type 2 report?
Under AICPA rules, SOC 2 reports are only for management, customers, and other key stakeholders. As a result we cannot post our SOC 2 Type 2 report publicly. We are however excited to share it with customers and partners.
If you’d like to get a copy of our report, or if you’d like to learn more about SOC 2 and how you might begin preparing to create a security management program that will help you complete your own report, get in touch now.
These changes continue: SSAE 16 has been replaced with SSAE 18 and a new “SOC for Cybersecurity” framework is coming this year.
1 These changes continue: SSAE 16 has been replaced with SSAE 18 and a new “SOC for Cybersecurity” framework is coming this year.
2 The AICPA is renaming the Trust Service Principles to Trust Service Categories, but is still using the acronym “TSP.”
3 The AICPA is updating the Trust Services Criteria to a 2017 version (effective in December 2018), but there are still only 3 additional availability criteria.
Every quarter, we host a webinar to share everything that’s new with Enclave and Gridiron.
In case you missed it, you can watch a recording of our January webinar below. You can also grab the transcript and the slide deck in our resources section. We provide a full recap of the event in this post.
January 2018 Quarterly Product Update Webinar
Meltdown and Spectre
We kicked off the webinar with a panel on Meltdown and Spectre, to discuss on how we protect our customers as in the event of disclosed vulnerabilities. We also did a deep dive into the ways that Enclave is architected for security more generally, with an emphasis on isolation based on trust levels. CTO Frank Macreery and Lead Enclave Developer Thomas Orozco discussed these topics and more in a conversation moderated by Web Security Advocate Elissa Shevinsky.
Meltdown and Spectre aren’t the worst vulnerabilities, but they are unique because they have a wide impact. Specifically, Meltdown and Spectre are a design flaw in CPUs that affects modern Intel processors as well as other processors that are used in everything from laptops to mobile devices. Meltdown uses speculative execution to leak kernel data. Spectre can also be used to leak information from the kernel. It’s trickier to exploit but also more difficult to mitigate.
Meltdown can be easy to exploit and those exploits are difficult to detect. Meltdown exploits privilege escalation paths, which makes it particularly relevant for cloud computing infrastructure like Aptible.
There are two general pathways to exploit Meltdown. The first is an attacker can gain access to data run by your peers (on same instance as you.) Aptible protects against this on Enclave by requiring that all sensitive workloads are isolated on dedicated-tenancy stacks (which are not shared with other customers).
The other way is to attack Enclave itself. If you look at Enclave, anyone on the internet can potentially open up an account on a shared environment. The way we protect the risk of access to Enclave is by separating our riskiest systems from our most sensitive environment variables that are required to administer Aptible.
We’ve put together a diagram and slide deck showing how we think about trust level, isolation, and threats. We isolate the riskiest components from the most sensitive and privileged data. Our CTO Frank Macreery wrote a blog post on how the Aptible team responded to Meltdown and Spectre. For an even deeper dive into these details, we recommend watching the webinar or reading the webinar transcript.
The steps for the Meltdown mitigation were fairly extensive. We began our patching process with the most high risk instances, which are shared-tenancy instances. (We love our customers but treat all applications as potentially untrusted by nature.) Then we moved onto dedicated tenancy stacks. As a final step we scheduled maintenance windows with customers to restart databases and make sure all instances were patched against Meltdown. We completed on Jan 9th, five days after we began.
There’s a lot to be learned from these vulnerabilities, and there are important lessons that our customers can and should apply to any potential security risk.
You always want to start your security process by identifying abstract threats i.e. where you are vulnerable and how you can architect to protect against those vulnerabilities.
Threats are individual. The threats to your company may be different than the threats that we face at Aptible.
You want to figure out what services you depend on (that are the biggest threats) and how to architect to isolate and protect against those threats.
Threats aren’t always abstract. Whenever a new major vulnerability emerges (Meltdown, for example) you want to assess how that fits into your security model and how you need to respond to it. And this is a process that should be ongoing over time.
The Enclave infrastructure wasn’t built overnight–it was an iterative process. The diagram that we’ve shown in the slide deck is the result of an accumulation of updates and improvements since 2013.
Enclave Metric Drains
We released a new feature: Metric Drains. Metric Drains help you monitor the performance of your containers. Container Metrics are captured every 30 seconds. They are routed every 15 seconds to the destination of your choice. Currently, that includes InfluxDB (both hosted on Enclave or hosted by InfluxData) and Datadog. More third party providers to come in the future–feedback welcome.
We’ve had a lot of requests around metrics retention. Using Metric Drains, you can retain the metrics for as long as you want, and,you gain ownership of your metrics. You can incorporate them into dashboards, for example, to better understand your applications.
Metric drains are functionally similar to Log Drains except for metrics. What’s captured?
Memory Usage & Limit
Disk Usage & Limit (DB Only.)
If you want to start capturing performance metrics, check out our documentation on Metric Drains.
Other Enclave Updates
Managed HIDS is now generally available. On a weekly basis, you get audit ready PDF + CSV reports.This satisfies compliance requirements for intrusion detection. Free for shared tenancy stacks and $0.02 / GB / hour for dedicated tenancy stacks.
VPC Peers and VPN Tunnels
Connect to any AWS VPC via a peering connection. No maintenance and 100% free. If you have your own VPC, this is convenient. The downside is this only works for AWS, which is where VPN tunnels come in as another option. VPN tunnels connect to any VPN network. Requires a VPN gateway. $99 / VPN connection.
For setup, just contact Aptible support at contact.aptible.com. Once setup, connection details are visible in the dashboard.
Additional Enclave Features
The Enclave CLI now supports JSON output. JSON output provides enhanced scriptability.
The DB:create command now supports picking a version.
Restored instances of a MongoDB replica set will no longer attempt to join the existing replica set.
As an Aptible customer, here’s what you need to know:
Enclave is architected to mitigate vulnerabilities like Meltdown and Spectre.
The Aptible Security Team immediately responded to the disclosure to further remediate the issue.
We provided realtime account of our response efforts on our status page. This blog post will provide additional context on our response. We’ll also share some of the ways our architecture is designed to protect against these sorts of vulnerabilities.
How these vulnerabilities impact cloud infrastructure
Meltdown in particular (more on Spectre later in this post) allows processes to read memory they should normally not have access to. By extension, in a PaaS environment running untrusted customer code, it allows customers to read memory they shouldn’t normally be allowed to read.
The vulnerability isn’t trivial to exploit at scale, but in theory, it allows for:
Escalation: one customer reads data (e.g. credentials) belonging to the PaaS provider, and uses that to compromise the PaaS provider itself, and by extension other customers.
Lateral compromise: one customer reads data belonging to another customer whose apps are deployed on the same underlying instance.
In other words, Meltdown is a critically important vulnerability for any PaaS provider. However, as an Aptible Enclave customer, you’re protected by the intrinsic architecture of Enclave, as well as an active Security Team. Here’s how.
Aptible Enclave is architected to protect against attacks like Meltdown and Spectre
In fact, this exact type of vulnerability where a customer gets access to memory they shouldn’t normally be able to read is part of our threat model, and Enclave is architected to protect against those.
Here’s how this plays out in terms of the escalation and lateral compromise attacks explained earlier:
Escalation: instances running customer containers on Enclave are unprivileged by design. All privileged access to e.g. AWS or Aptible APIs is orchestrated through isolated “coordinator” instances, which do not host customer containers.
Lateral compromise: for sensitive data, Enclave requires that customers deploy on dedicated-tenancy stacks, which host a single customer’s containers.
In other words: the container boundary is our first line of defense, but it’s not the only one.
Aptible’s Meltdown remediation efforts
As soon as the Meltdown vulnerability was publicized, we acted immediately to deploy patches across our infrastructure to restore the integrity of the container boundary before public exploits were available. These patches needed to be applied to the Linux Kernel, and are known as the “Kernel Page-Table-Isolation” patch set (or “KPTI”).
Here, our remediation was made more difficult by the fact that the Ubuntu Linux distribution, which we rely on for Enclave, was taken by surprise by the unanticipated early release of the vulnerability on January 3rd, and did not have patched Kernels available yet.
As a result, hours after the vulnerability was announced, we started working on a contingency plan, which consisted of building our own patched Kernels targeting Linux 4.14.112. On January 4th, we understood that Ubuntu was unlikely to be able to provide patched Kernels before January 9th (which turned out to be correct), and made the decision to roll out our own Kernels instead3. Other providers have since announced that they followed a similar approach.
Once we validated our newly-minted Kernel through Enclave’s suite of integration tests, we published our plans on our status page and contacted customers with scheduled maintenance windows. Over the course of a few days, we replaced thousands of instances with minimal disruption. Ultimately, our patching of Meltdown completed early in the morning of January 9th, before public Meltdown PoCs were available and before Ubuntu had released patched Kernels.
|January 3, 2018||We posted to our status page indicating that the Security Team was monitoring the expected release of information about an upcoming vulnerability.|
|January 4, 2018||Once the details of the vulnerability were released, we published our response plan to our status page, and prioritized response around patching Shared Stacks (which are inherently vulnerable to Meltdown) and otherwise vulnerable Dedicated Stacks.
We completed kernel patching for Shared and Dedicated Stacks. We used a bespoke kernel because an official kernel patch was not yet released.
We began to contact each customer to coordinate a scheduled maintenance window during which we could restart databases, as needed.
|January 9, 2018||We completed all patching and database restarts needed for all remediation efforts related to Meltdown.|
Looking ahead and Spectre remediation
As of now Aptible has fully remediated Meltdown for Enclave Stacks.
Going forward, we are continuing to assess the impact of the Spectre vulnerabilities and the development of mitigations in the Linux Kernel to protect against it. Once these mitigations evolve, we’ll likely follow a similar approach (albeit with less urgency) to deploy mitigations for Spectre.
The stakes continue to get higher, as the threat environment continues to elevate just as the consequences for data breach grows. The Aptible Security Team will continue to be aggressive about protecting our customers’ environments from these and all critical vulnerabilities.
- The meltdownattack.com site provides useful information, recommendations and links to security advisories that describe Meltdown for a context broader than this blog post. You may find useful information there related to how to appropriately respond to Meltdown in your own cloud or personal data environments.
- Some additional fixes to the KPTI patch series were included in the subsequent Linux 4.14.12 release. 4.14.12 hadn’t been released yet when we started rolling out our Linux 4.14.11-based Kernel, but we did backport the relevant patches onto our 4.14.11 tree ahead of time.
- It’s worth noting that the reason we were able to move faster than Ubuntu was because we had fewer constraints. Indeed, Ubuntu guarantees a stable Kernel version for a given Ubuntu release, which means they had to backport the KPTI patches onto older Kernels. That’s a lot of work, which they had to complete on short notice. In comparison, we had the flexibility to choose to upgrade to a newer Kernel instead, which we did.