Mat Steinlin is Head of Information Security at Aptible. He has led security assessments for digital health companies through HIPAA, HITRUST, SOC 2, and PCI compliance for over a decade.
Something has changed in the last eighteen months that most security leaders are still processing. An annual penetration test used to require two weeks and cost over $20,000. Last week our head of engineering spent two hours and $20 using an AI model to scan our codebase, and it found several meaningful vulnerabilities that our most recent pentest had missed. It wasn't that long ago when AI-powered vulnerability testing felt like a futuristic vision; now it's trivial.
HIPAA, SOC 2, HITRUST: the audit frameworks governing regulated industries were built around a model where sophisticated attacks required sophisticated attackers. And the annual pentest was priced accordingly. You hired an expensive, specialized firm because the alternatives weren't accessible. That scarcity created a natural cadence: assess once a year, remediate the findings, repeat.
The scarcity is gone.
XBOW published benchmark results this week showing GPT-5.5 achieves only a 10% miss rate on a suite of real vulnerabilities in open-source applications, a performance strong enough that XBOW retired their internal benchmark entirely. When we ran a scan with Opus against our own codebase, it surfaced more meaningful findings than the $27.5K professional engagement completed six weeks prior.
The structural effects of the collapse in cost and expertise required for offensive security are visible across the industry. HackerOne paused new vulnerability submissions to its Internet Bug Bounty program in March 2026: the stated reason was that the gap between AI-assisted discovery and the ability of open-source maintainers to ship remediations had become impossible to bridge. When the economics of bug bounties break, the message is that finding vulnerabilities has become cheap and fast enough to outrun the patch cycle industry-wide.
This doesn't mean human judgment is obsolete, it just means the annual pentest model is. Two very different arguments.
The problem was always the gap between your last pentest and today
The annual pentest was never designed to give you continuous security. It was designed to give you a point-in-time snapshot of your attack surface that satisfied an auditor and gave your team a remediation queue. And that's exactly what it does. What it can't tell you is anything about what changed after the pentest team left.
Consider what happens in a typical environment between annual tests:
New services are deployed
Cloud configurations drift
A dependency is added with a transitive vulnerability no one reviewed
An access policy is broadened for a project and never tightened back
An engineer leaves and their tokens aren't fully rotated
A third-party integration quietly expands its permissions scope
Every one of these is an attack surface change, and none of them appear in last year's report. The gap between your last assessment and today is an attack surface problem, and it grows every day.
The exploit timeline makes this worse. Based on Cloud Security Alliance research about AI-speed vulnerability weaponization, a representative disclosure now plays out like this:
T+0:00 | CVE published. Your vulnerability management tool picks it up in its next scheduled scan. Your team is notified, maybe in Slack, maybe in a weekly digest. |
T+2:00 | Proof-of-concept lands on GitHub. Researchers and threat actors are both analyzing the advisory. The race to weaponize begins. |
T+8:00 | Working exploit in the wild. Automated tooling is scanning the internet for vulnerable instances. |
T+72:00 | Mass exploitation underway. If your patch hasn't landed, you're statistically likely to have been targeted. Your change management process, approval chains, and deployment pipeline are all working against you. |
T+2 weeks | Your patch is approved. After ticket routing, testing, change board approval, and a maintenance window, the fix is deployed. At that point, the only question is whether the targeting was successful. |
There's a complication the table doesn't show: at T+0, your scanner may not know the CVE exists. VulnCheck found that more than half of actively exploited vulnerabilities lacked NVD enrichment during 2024. The timer started before you got the notification.
Continuous penetration testing is necessary, but it still isn’t enough
The security industry's answer to this problem so far has been: run more scans. Automated scanners, weekly vulnerability reports, DAST tools in CI pipelines. More signal, more dashboards, more “secure”.
That's continuous detection. It's better than point-in-time assessment, but it still rests on a shaky premise that the goal is to find and fix vulnerabilities before attackers find them.
When exploit timelines are measured in hours and offensive security is widely available, that's a bet you can't reliably win. You will miss something. A zero-day will land on a system you thought was low-risk. A credential will be compromised through a channel your scanner doesn't see. A breach is bound to happen. The goal isn't to outrun the rain; it's to build a roof so that when it pours, your business stays dry.
What I’m trying to say is: organizations ahead of this aren't just moving to continuous testing. They're building on the assumption that they’ll eventually be breached. Rather than asking themselves, “how do I avoid a breach,” they’re asking, “what’s the blast radius when that breach happens?”
In practice, that means:
Network segmentation designed to contain lateral movement from a compromised system
Identity segmentation so a single compromised credential doesn't reach everything
Audit logging that enables forensic reconstruction (not logs archived for annual review, but logs that can answer "what did the attacker access and when?)
Kill switches and revocation capability that work at machine speed, not ticket speed
Breach response plans that are rehearsed before the incident, not drafted during one.
The annual pentest tells you whether your controls exist. Continuous testing tells you whether new vulnerabilities are appearing. Neither one tells you whether you're prepared for the breach you're bound to have. To be clear: the annual pentest isn't going away. Vendor risk questionnaires require it, customer contracts mandate it, cyber insurance underwriters ask for it, etc. But those aren't security arguments, they're business requirements. They describe a compliance floor, not a security posture, and the floor is not where breaches get stopped. That preparation is a different project, and most compliance frameworks don't require it.
What this means for your security program
The compliance floor (annual pentest, periodic audits, documented controls) is still the floor. Regulators haven't moved. But auditors have more latitude than most programs use. HIPAA, SOC 2, and PCI require an outcome: evidence that your attack surface is being tested against real techniques. They don't mandate a specific tool. If you're running continuous automated red teaming and can show findings, coverage, and remediation in a format the auditor recognizes, most auditors will meet you halfway.
"Compliance requires the annual pentest" is rarely true when you read the text; it requires an outcome, and you now have more ways to produce that outcome than you did five years ago.
But the floor is not the target, if it ever was. The organizations treating it as one are writing incident post-mortems that start with "despite passing our last audit…"
The annual pentest doesn't disappear in this model. When you have continuous visibility into your attack surface, a periodic pentest matures into something more useful: a way to validate your controls and test your detection capability rather than discover what's broken.
A modern security program runs three things in parallel:
Continuous penetration testing, specifically continuous automated red teaming (CART), to replace point-in-time assessment with ongoing visibility;
Active use of AI-assisted code review to find what humans miss at scale in the development pipeline;
An architecture that assumes breach and asks "what's the blast radius?" instead of "can we prevent every attack?"
The last piece is the one most compliance programs don't have. It's also the hardest to retrofit after an incident.
Aptible’s perspective
As a security-focused PaaS provider for regulated industries, we sit between infrastructure and compliance for a large base of engineering and security teams, which means we see vulnerability patterns and posture trends across that base that no single-organization program can replicate. We see what actually breaks in practice.
Since the release of Mythos, we've been stress-testing where AI-assisted vulnerability analysis holds up and where it doesn't. Our read: practices like pentests were once seen as a a solid practice for mitigating and detecting vulnerabilities. But because they no longer detect vulnerabilities at the same rate as LLMs, they no longer detect vulnerabilities at the rate of attackers. Adapting your security strategy to this shift is becoming increasingly critical.
We're working through what this transition looks like for regulated-industry environments specifically. If you’re a CTO, CISO, or security architect, how are you thinking about this new threat landscape? What’s making you nervous, and how are you weighing continuous testing against the annual pentest model?
