Adam Surak, Algolia

Adam Surak is Head of Infrastructure and Security at Algolia, an AI-powered search and discovery platform. He is a vocal advocate for compliance automation, which he has championed at Algolia to reduce the burden on teams across the company. We spoke to Adam about automation’s impact, systemic changes in organizational sustainability, and the future he sees ahead in GRC.

What does GRC look like at Algolia?

At Algolia, we actually don't have a formal GRC team. On the legal team, which reports to the general counsel, the compliance officer serves as an independent contributor in terms of compliance and risk management. We have it set up this way so that our compliance officer can prepare for internal auditing; to do this, he needs to be isolated from the actual implementation. The security team actually creates and implements policies, while the compliance officer makes sure those implementations are aligned with the frameworks like ISO 27001 and C5. 

We currently have five people on the security team, with plans to bring one or two more on board this year. The manager, who reports to me, is a former CISO. We also have two security engineers who both did pen testing for years and grew frustrated with making recommendations that customers never fixed, so they decided to come to the other side and actually implement changes themselves. We have a dedicated software engineer who was previously doing work on small applications but got fascinated by the challenges of security compliance. And we just hired a security analyst for processes. We’re still building out a more formal GRC function, but the security team is taking a lot of ownership on compliance. 

How has compliance evolved at Algolia? 

Our compliance automation journey started with the processes that bother us the most—the ones that are painful and have a lot of entropy. We literally write out a step-by-step guideline on paper that anybody can follow, then we write a simple program to do exactly the same thing. 

We’re a growing company. When I joined Algolia, we had eight employees. We did SOC 2 when we were at about 50. Today, we are 319 employees and plan to be around 580 by the end of the year. Our entropy comes from people much more than systems. We don't have a huge sprawl of systems, but there are certain challenges when it comes to people that we are not ready for today. Take contractors, for example. There could be a contractor who needs access to critical systems, but another contractor who is just a consultant with no access. One contractor looks like an employee, and another comes to us through a PEO so they’re kind of in this hybrid state. There are so many nuances for people and what they should have access to. It creates a lot of entropy in the environment because it’s not just looking at an account, it’s looking at the story behind it. 

With this amount of growth, we have many employees coming in and out of the environment. We’ve focused a lot on onboarding, bringing people into the system and getting them up to speed very quickly—but in a consistent way that doesn’t create too much of a mess in our environment. And on the other side, when people leave the company we can offboard them effectively. We transitioned from a manual, step-by-step checklist that an IT person needed to follow manually to executing a single command and having 99% of it done automatically. There might be one or two manual actions left depending on specific requirements, but on the whole we don’t have to spend hours reviewing employees and their user access.

You’ve obtained and maintained SOC 2 Type 2, SOC 3, ISO 27001, ISO 27017, and C5 (Germany) in the past couple of years. What are you focusing on in 2021?

The focus for us in 2021 is efficiency. We want to create overlaps with the work of the controls so we can spend as little time on frameworks as possible. If we can align controls by using common denominators and finding the highest standard that we have to comply with across all the frameworks, then we can reuse everything efficiently. 

Our long-term goal is the ability to go through an audit for all of these frameworks every day. No one should break a sweat.

So our focus in the meantime is to find the loose ends in our policies across the company. Sometimes colleagues complain that compliance is a difficult process for their teams. We say fine; the security team will help. We’ll understand what’s bothering them and then make it faster and more efficient. We automate low-hanging fruit, which gives time back to the team so they can focus on high-value tasks instead of collecting and verifying evidence manually across five or six different systems. 

With the size of your team, what you were able to accomplish after implementing automations that you weren't able to do before? Would you have been able to obtain all the certificates without automating?

Every company has a different threshold of pain that’s probably dependent on their stage as a startup. Early startup founders have a very high threshold of pain. They don’t change things until something really hurts. But when the company grows to a few hundred people, they don’t have to suffer. They figure out a better way. 

Here’s the way I look at it: are we able to do all of it without the automation? Yes, absolutely. Are people going to want to work in the team after they do it manually once or twice? Probably not. The consequences of doing it all without automation are pretty severe. It means more people, so the audit becomes more expensive. It means a higher risk of anomalies and then having exceptions in a SOC 2 report or not being able to defend a ISO certificate. It would even prevent us from onboarding new frameworks because our team would be resistant to that much work. We would be able to do it, but it would hurt.

There are people out there who love to do compliance, but that’s not my team. They don’t love collecting evidence. They don’t love exporting policies into PDF to hand them over to auditors. What they want to do is solve security problems; they want to continuously improve the system and reduce tasks to take up the smallest amount of time possible. It's exactly the same thing as with customer questionnaires in vendor management. We all understand that we all have to do it, but no one wants to. Can we do it? Yes. Would we be happier if we didn't have to do it? Also yes. 

Compliance work is very repetitive. It’s literally go and follow the checklist. It's not very smart, elaborate, or creative. But we want creative people on our security team, people who think out of the box and who can analyze issues and communicate about security. It's the security version of the operations toil. Automation allows us to keep compliance under control while continuing to grow. With more people, more compliance frameworks, and more systems, the amount of time that we spend on compliance doesn't increase. It should stay fairly constant or even decrease. And only automation can deliver that.

How are you able to measure the impact of your automation efforts?

We measure success by the number of errors our auditors detect. Audits are a very interesting game of probability, because auditors ask for a sample. But in order for us to guarantee there’s nothing bad in their sample, we have to do everything. 

Historically, we’ve had something pop up in the sample, and they were exceptions we hadn’t known about. We’re now at the point where we have close to a 100% success rate of knowing there is an exception ahead of time. We’ve built a process to inform us of anomalies and bubble them up, which means we can resolve them and not have to rely on auditors coming in and telling us there’s a problem. 

Another way we measure success is by “taking the temperature of the team”. It's been difficult to put a number on it, but we gauge team morale from our retrospectives. Does the team “overheat” before the audit? Are they struggling to get through evidence? Are they taking screenshots that auditors are requesting from GitHub to show pull requests? We started to look at those types of activities and we’re finding that with every single new automation that we introduce, we are reducing the pressure of manual work on the team. So we calibrate often. If the team is “overheating”, there's probably one more thing that we need to automate. 

Sometimes, it’s not even a problem that requires automation at the stage where the team is feeling pressure. It can be a solution that’s pushed further to the beginning of the problem. Change management is a good example. We could tell the engineering team that when they do pull requests, the request needs to be reviewed. Then we could build automation on top of that to notify them if they don't do it. Or we could enable protected branches in GitHub and enforce it there so that no change can go through unless it is approved. We remove the pressure from the team and we push it towards the source of the problem at the same time.

A lot of our work is going to teams and figuring out how to help them. Often their first reaction is, “No, we can’t automate.” They think everything is a manual process and there’s no way to automate it. So we’ll ask them to sit down with our engineer for 30 minutes, and if after 30 minutes they still have the perception that nothing can be automated, I’ll owe them an apology. But half an hour later, I’ll get a Slack message saying ”I know how to automate 90% of my job!” 

You’ve said before that you like moving from building resilient systems to resilient teams. Does automation play a role in that work?

This is really a reflection of my journey at the company. I came on as a DevOps engineer and then I took over security, and now today I run the infrastructure team. My focus now is the team because I work with teams and teams work on things. And then inside the team is a manager who works with the resolution of people and people work on things. Ultimately, I need the teams to be resilient. 

At the beginning of the company, we were able to build resilient systems and grind things out, and it was doable. The first few years we were working up to 120 hours a week. If I was doing laundry, I was on chat at the same time talking to my colleagues. When I woke up in the middle of the night and opened the chat, there was someone to talk to because someone was always working. 

But eventually, we had to sleep—and after a few years of going at this pace, we wanted to sleep. And so that introduced a change in my posture for what we required from people and how we saw sustainability. I proved it could work by taking a different approach. People in the industry are always talking about chaos engineering, introducing chaos into your system to gain visibility into weakness. I introduced vacation into my system. I told people to go on vacation so that I could discover how teams break and what is missing when they’re gone. I don't need a chaos monkey to disturb stuff. The chaos that is created by taking one super important person and sending them on a vacation for two weeks is incredible! My focus became figuring out how we can do all of the things that we continue doing without relying on the human grind, and automation is one of the ways. Someone can go on vacation and the automation should continue working, so the person doesn’t have to do it. We don't have to end up in a situation where we need our entire team to be at work weeks prior to the audit because they have to collect the evidence. We can schedule the audits regardless. The team itself needs to continue functioning. 

Organizationally, if the company keeps growing, we’ll have more people, more systems, more compliance frameworks, more customers, and more demands. The team cannot scale exponentially with our number of customers. It doesn’t make sense for that growth to be linear, but we need to be smart and we need to be efficient so that people don’t burn out.

Auditors often have pretty specific processes. How has their response been to your automation efforts?

I definitely see a very positive response coming from auditors. I had an opportunity to dive deep into the audit process because I was leading our audits for multiple years when there was no one else on the team. I literally spent a month and a half before every audit collecting evidence on my own, talking to the auditors for a week, and then reporting back to our executives. But it showed me how auditors do some of their reviews. For example, they ask for screenshots to log them into an Excel spreadsheet so that they have a list, and then they have a macro in Excel which generates the population that they want as a sample. It’s an entire process for them, just to be able to choose 50 items in a sample. So instead I give them evidence in an Excel spreadsheet so they just have to run the macro and they’re done in 10 seconds. 

I’ve worked with them a lot on reducing some of the round trips. Another example is HR-related controls. They ask for new employees, current employees and offboarded employees, because there are specific controls that relate to people coming and leaving the company. That's the population, and then they make a sample and they ask for details about the sample they’ve selected. So we give all of it to them on the first go. It’s actually less work for us to submit all the information than it is to process and generate information just for the sample, and then they don’t have to ask us for the sample. SOC 2 has an entire methodology for how to choose a sample out of the population. Why? Because they don't have the human capacity to verify everything. But suddenly keeping it in a format which is reasonable and can generate answers programmatically could actually improve compliance by being able to go for 100 percent. While the sample they choose must have statistical significance, we can do better than statistical significance when we’re able to analyze the entire population. 

The next step we are discussing with our auditors is pushing our data into their system automatically or enabling them to pull it from our GRC. That could change everything. Suddenly, we might not do audits on a yearly basis. Maybe it’s a weekly process. I'm pretty sure every auditor would love us to pay for a weekly audit, but they don’t have a way to humanely process that. It has to be automated. We provide the data faster, they can verify it more easily, and they can have less manual work on their side, which they would love too. It’s a move in the direction of continuous compliance.

Do you think there are some compliance tasks that can’t be automated and will forever be managed by humans? 

I think we can automate all of it eventually. Currently, a lot of organizations’ policies are written by humans for humans. If they try to automate them, it's not going to work because there might be parts which require human intellect and some insights that can’t be codified. But maybe we have to change. In one part it’s reducing vague terminology, evolving from saying, “This has a severe impact on the organization” to “This is going to impact more than 5% of the annual revenue of the organization,” for example. The problems are very apparent in risk management today. We ask about the likelihood of some risk, and there are these criteria like very low, low, medium, high, very high. But it’s been proven that people who are not trained on this evaluation scale are all over the chart on what a potential risk actually is. More often than not, it’s very high if it’s the area that you are responsible for, and low for everyone else. We need to change this into something quantifiable and get rid of all the vagueness that’s present in a lot of these processes. 

Then I believe that we can automate it all because it becomes very similar to accounting. Our security frameworks are already audited by accountants, and they’re following laws and regulations that dictate exactly how compliance is assessed and reported. Eventually I see humans only reviewing the outliers where the system is not completely sure. We see in the accounting world where this is already happening. SAP does it. If a company survives SAP implementation, it can survive anything. SAP codifies a lot of processes, like how data flows around the company and through what consistency model. In terms of compliance, we have a similar opportunity because at the end of the day, it's not like our email accounts are spread across the random distributed system; we have our accounts in G Suite. If we want a list of employees, we go to the HRIS and we find it. We don't have to ask HR to bring 50 papers and read the list of people because they have it in a book. It’s all electronic; we just have to make it readable.

Do you struggle with engineers and other team members complaining about compliance requirements?

Compliance and security are very interesting engineering topics with lots of unsolved problems across different areas. As an industry, we keep talking about data science, data mining and big data as hot topics, but if you’re looking for a data set you don't have to scrape the web. Talk to your security team! They have a huge pile of data that needs some insights so that they can process it. We need human intellect to develop solutions to problems and actually all do better. 

We all want to have more data because it means bringing more value to customers. But outside of the legal team and the security team, who is honestly thinking about the liability that the company takes on with every single new gigabyte of data ingested into the system? Two new petabytes of customer data is two petabytes more risk. Who actually read GDPR? I did. Multiple times. For security professionals, compliance professionals, engineers, it's a law that we have to follow. Every legitimate company has data or provides a service that makes them legally obligated to respect privacy. Can we work towards responsible engineering where we can expend a little bit more effort to make the systems safer? Can we do it by relying more on automation instead of people following a compliance checklist, and would that help us see it more as a benefit to the efficiency of the organization instead of something slowing us down? 

I agree that during normal operations, when nothing is happening and everything is okay, compliance slows everybody down, putting up roadblocks and making simple things complicated. How much? It’s debatable. But when an incident happens, the compliance frameworks requiring audit logs are asking that for a reason—because it will help identify what happened so that you can go back and fix it. We can do better. 

By creating better security and better compliance and not just focusing on the certification?

Yes. SOC 2 is great. Go and do it, but it’s not a one-off thing; it's a commitment for the future. A lot of the companies today that do SOC 2, they have something like 10 to 30 percent growth. It’s fine for the first two years, but then it starts to ramp up and they suddenly have a higher error rate in their processes.

The hunt for the certification is real, but audits are the commitment to keep it for the future. Imagine closing a customer on the SOC 2 audit, and then you come back a year later and say, ”We didn't get the audit because we had serious violations in our security.” The customer is going to churn immediately—and if something happens, you're liable even more. So it's a hunt for something which actually has way bigger consequences then it might seem at the very beginning.

What else do you see in the future for GRC as an industry?

Interconnection it's going to be a very interesting element, one that I hope could lead to data standardization. Let’s say I'm reviewing a vendor. I can get their policies, their SOC 2 report, their CAIQ questionnaire. (These questionnaires like SIG are really awful, from my point of view, because the big companies use the full 1,700 questions only because they paid for the license for it, so why not use it?) But once I get what I’m asking for, I have two options. One is to spend days reading through it, digesting it and figuring out what I'm looking for. The other is to the CAIQ and ingest it into my system in a programmatic way. It's going to take a part of the CAIQ and map the controls into my controls so I can instantly see the risk of the vendor that I'm evaluating. There is a human element of understanding what the vendor does and how they do it, but if we could standardize this part of communication, it would meet the needs we’re looking for. 

Or what if there was a PDF from the auditor but also a data format like a CSV that maps to standard controls and outcomes? Then we could ingest it inside our GRC and immediately see whether there are exceptions. Do I have to process the whole PDF? We can do better. I'm hopeful that we are going to evolve as an industry in the exchange of information and the interconnection of our GRC systems. Ideally, when reviewing a vendor, we would have the unbiased picture. Right now, vendors are giving me a security marketing picture, but I'm taking a security legal liability by working with them. Standardizing the data could help us all be more open about it.