>

Shadow AI in healthcare: the risk your compliance policy isn't covering

Shadow AI in healthcare: the risk your compliance policy isn't covering

By Mat Steinlin, Head of Information Security

Last updated: April 2026

What shadow AI looks like in practice

These aren't hypotheticals.

A developer is debugging a parser that extracts clinical notes from an EHR integration. Something is breaking on a particular note format. They paste the note into Claude.ai to ask why the regex isn't matching. The note contains a patient name, date of birth, and a diagnosis. They get the answer, fix the bug, and never think about it again. No BAA. No logging. No record that it happened, until a security review six months later surfaces the browser history.

A care coordinator uses ChatGPT Plus to draft a follow-up message for a patient because the approved CRM is slow and the template system is frustrating. They know PHI isn't supposed to leave the system, but they're not thinking of this as a data transfer. They're thinking of it as drafting help. The message contains the patient's name and their appointment type. ChatGPT Plus doesn't have a BAA with the organization.

A developer installs Cursor on their work laptop because their colleague mentioned it makes code reviews faster. They start working in a feature branch that touches the PHI processing pipeline. Cursor's AI assistant has context from their open files. They haven't read the data handling policy. No one told them that IDE extensions can send code context to third-party servers.

In each case, the person wasn't reckless. They weren't trying to circumvent compliance. They were trying to get work done.

Shadow AI incidents in healthcare companies almost always share this pattern: a productivity-motivated shortcut through an unsanctioned tool that happens to touch PHI. The compliance implications are discovered after the fact, often by accident.

Why it happens

The generic security industry answer to shadow AI is policy, monitoring, and employee training. That answer misses the actual cause.

Shadow AI happens because the compliant path has friction. If using an approved tool takes 20 seconds and submitting a request for access to an approved AI tool takes two weeks, developers will use the unapproved tool. This is predictable behavior, not a failure of discipline.

Three specific friction sources drive most shadow AI in developer workflows:

Approval lag. Adding a new AI tool to the approved list typically requires a security review, a vendor assessment, and sign-off from at least one stakeholder who isn't always responsive. For a team moving fast, waiting three weeks to use a tool their colleague recommends is prohibitive. They use it and wait for policy to catch up.

Ambiguity about what requires a BAA. Most developers in healthcare companies know their production database requires care. Many don't know that using a consumer AI product (even for a "quick question") can create PHI exposure if the prompt contains patient data. The distinction between an API with a BAA and a consumer product without one isn't obvious unless someone has explained it explicitly. Often no one has.

Real productivity gaps. The approved tools are sometimes meaningfully worse than the unapproved ones. A company that approved GPT-3.5 for internal use a year ago and hasn't updated its list is a company where developers are using GPT-4o through their personal accounts. The productivity differential is visible and daily.

Policy that doesn't address friction doesn't solve shadow AI. It just makes incidents quieter.

The healthcare-specific consequences

Shadow AI happens in every industry. What makes it a different problem in healthcare is what's at stake when it does.

BAA gaps

Under HIPAA, when PHI is shared with a vendor, that vendor must sign a Business Associate Agreement committing to specific data handling requirements. Consumer AI products (Claude.ai, ChatGPT Plus, Gemini) do not have BAA programs for individual accounts. Some providers offer enterprise tiers that include BAAs; the consumer products do not.

If a developer pastes PHI into a consumer product, there's no legal framework governing how that company handles the data. It may be used for training. It may be retained indefinitely. You have no audit rights, no contractual obligation to notify you of a breach, and no recourse if the data is mishandled. See the BAA chapter for what a proper BAA actually requires.

Breach notification exposure

A shadow AI incident that results in PHI disclosure to an uncovered entity may constitute a HIPAA breach. If it does, you have 60 days from discovery to notify affected individuals and HHS. "We discovered it 18 months later during a security review" is a more serious problem than "we discovered it last week," but both trigger the same clock.

The question of whether an incident constitutes a reportable breach depends on factors including the nature of the PHI, the recipient, and whether the data was actually reviewed or retained. A compliance attorney makes that call, not your engineering team. The point is that shadow AI incidents generate breach notification exposure, not just internal policy violations.

Audit findings

Healthcare security audits (SOC 2, HITRUST, or an HHS investigation) ask about the inventory of systems and tools that access or process PHI. "We maintain a list of approved tools and have controls to prevent unapproved tools from being used" is the expected answer. "We're not sure what tools developers were using" is a material finding.

Most organizations that have not actively addressed shadow AI would answer the second way if they were honest. The difference between a finding and a near-miss is whether you discover this yourself and can document remediation, or whether an auditor discovers it and requires it.

Data retention and provider policies

Consumer AI products have their own data retention policies. Those policies change. OpenAI's default retention policy for ChatGPT users has changed multiple times. Anthropic's Claude.ai data handling has its own terms. When PHI enters a consumer product, its retention fate is governed by that product's terms of service, which you did not negotiate, cannot audit, and cannot enforce.

For a covered entity with HIPAA obligations, losing control over PHI retention is not a minor inconvenience. It's a liability that compounds over time.

A taxonomy of shadow AI risk in developer workflows

Different tools carry different risk profiles. Understanding the breakdown helps prioritize where to focus governance effort.

Consumer chat products

Examples: Claude.ai, ChatGPT Plus, Gemini, Perplexity

Risk level: Highest

What makes them risky: No BAA at the free or standard tier. Data handling governed by consumer terms of service. No organizational logging or visibility. Developers use them naturally for debugging, drafting, and analysis without thinking about the compliance posture.

Guidance: These tools should be in the "not approved for PHI" category in your tool list, explicitly. Not "discouraged," not approved. The API versions of the same models (Anthropic API, OpenAI API) can be approved if used through infrastructure that has a BAA. The consumer products cannot. This distinction needs to be stated explicitly because it's counterintuitive.

IDE AI assistants

Examples: GitHub Copilot, Cursor, Codeium, Continue.dev

Risk level: Medium to high, depending on configuration and context

What makes them risky: IDE extensions often have access to open files, recent edits, and project context. When a developer working in a codebase that handles PHI opens a file containing patient data, an IDE assistant may send that context to a third party. Data handling policies vary significantly by vendor and tier.

GitHub Copilot offers a Business/Enterprise tier with a BAA and organizational data handling commitments. Individual accounts use consumer terms. Check the GitHub Copilot data privacy documentation and verify your organization is on an enterprise plan with appropriate data handling commitments before approving Copilot for use in PHI-adjacent codebases.

Cursor doesn't offer a BAA in its standard pricing tiers as of this writing. Organizations requiring HIPAA coverage should evaluate the Business tier terms directly and confirm BAA availability before approving Cursor for work in PHI-handling codebases. Cursor's privacy documentation should be reviewed against your organization's requirements.

Continue.dev (open source) can be configured to route to a self-hosted or organization-controlled backend, which makes it potentially approvable if you control the model endpoint. This is more operational overhead but removes the third-party data handling concern.

The general guidance: audit which IDE extensions your developers are using and check each one's data handling policy and BAA availability before approving it for PHI-adjacent work. "We haven't evaluated it" is not an approved state.

Direct API access with personal keys

Risk level: Medium

What makes it risky: A developer using their personal Anthropic or OpenAI account to make API calls has not entered into a BAA with that provider on behalf of your organization. Even if the organization has a BAA with the provider, the developer's personal account is not covered by that BAA.

This pattern shows up when approved API access has friction: waiting for a key from IT, not knowing which account to use, wanting to experiment outside of work infrastructure. The fix is making it easier to use an approved key than to use a personal one.

Unapproved internal tooling

Risk level: Medium, with high variance

What makes it risky: A developer builds an internal tool that integrates an LLM to make their own workflow faster. The tool is not reviewed by security. It may have no authentication. It may log to wherever the developer happened to write logs. If the tool accesses PHI in the course of its function, it may not meet any of the requirements of the other chapters in this guide.

Internal tooling built outside sanctioned infrastructure is shadow AI even if the developer is well-intentioned and technically competent. The risk isn't malice; it's missing the compliance requirements they're not thinking about.

How to fix it: approved tools and a frictionless path

Two things address shadow AI in practice: an approved list that people can find, and a compliant path that's fast enough to use.

Maintaining an approved list

An approved list is not a policy document buried in your internal wiki. It's a short, maintained reference that answers the questions developers actually have:

  • Is [tool] approved for use in work that handles PHI?

  • What tier or configuration is required?

  • Is there a BAA in place?

  • What are the approved use cases?

The list should include consumer products (with explicit "not approved for PHI" labels), IDE extensions, API-based tools, and any internal tooling that has been reviewed and sanctioned. It should be updated when a new tool is reviewed, not when someone asks and discovers the list is 18 months stale.

For the list to work, it needs to be: findable (linked from your developer onboarding, your internal security docs, and wherever developers go to look things up), short (not a comprehensive policy document; a reference table), and honest (if a tool's BAA status is unclear, say that rather than leaving it off).

New tool requests should have a defined review process with a defined turnaround. Three weeks is too long. If your security team can do a lightweight evaluation of a low-risk tool in 48 hours, document that path and make it visible. The goal is to remove the incentive to use unapproved tools by making the approval process fast for tools that are genuinely low-risk.

Making the compliant path fast enough to use

The approved list addresses the knowledge problem. The frictionless path addresses the friction problem.

If accessing an approved API key requires opening a ticket, waiting for IT, getting a response three days later, and following a setup process that takes 30 minutes, that's a path developers route around. The same developer who would use the approved path if it took 90 seconds will use their personal account if it takes three days.

The organizational argument for an LLM gateway is partly cost control and partly audit logging, but it's also this: a gateway gives you a single approved path that is as fast as a direct API call. A developer needs API access to Claude? They get a scoped key that goes through the approved infrastructure. No ticket. No waiting. No reason to use their personal account.

Aptible AI Gateway implements this pattern: scoped keys with BAA coverage, model access controls, and logging through a single managed path. Other solutions exist: a self-hosted LiteLLM proxy with organization-issued keys, or a well-documented key distribution process if your team is small enough. The specific solution matters less than the principle: the compliant path should require less effort than the noncompliant one.

Monitoring and detection

Monitoring for shadow AI is a secondary defense. It helps you discover incidents and understand the scope of the problem. It does not prevent incidents. If your monitoring creates significant developer friction, you will likely get less benefit than harm.

Egress monitoring for known LLM API endpoints is feasible if you have network-level visibility. Anthropic, OpenAI, Google, Mistral, and other providers all have stable endpoint domains. Requests to api.openai.com, api.anthropic.com, and similar from corporate infrastructure that aren't routing through your approved gateway indicate direct API access outside sanctioned infrastructure. This is more useful in environments with corporate device management than in fully remote teams where developers work on personal networks.

SaaS visibility tools (Zscaler, Netskope, and similar) can identify when consumer AI products appear in browser traffic on managed devices. For organizations with MDM infrastructure, this gives a second signal for consumer product usage. For smaller teams without MDM, it's not practical.

Developer onboarding and periodic attestation are lower-tech but often more useful: clear expectations communicated at onboarding, with periodic reminders when new tools become relevant (e.g., when a new consumer AI product launches that your team is likely to try). The goal isn't surveillance; it's making sure developers know what the rules are and why they exist.

The honest caveat: you cannot catch everything, and treating shadow AI primarily as a detection problem misses the point. Developers who understand the risks and have a fast, approved path are safer than developers who are monitored. Build the frictionless path first.

FAQs

Can developers use Claude Code with PHI?

Claude Code is a CLI tool for using Claude as a coding assistant. The data handling depends on how it's configured.

Claude Code makes API calls to the Anthropic API. Anthropic's BAA program covers API usage, which includes Claude Code requests. If your organization has a BAA with Anthropic and Claude Code is configured to use an API key governed by that BAA, the usage is covered. The same scoped key architecture from the key management chapter applies: if a developer uses a key issued under your organization's BAA, the requests are covered; if they use a personal key or no key, they're not.

The practical guidance: document this explicitly in your approved list. "Claude Code is approved when configured with an organization-issued API key under our Anthropic BAA" is a clear, enforceable statement. Don't leave it ambiguous; developers will make their own inference and it may not be the one you want.

What about Cursor?

As of this writing, Cursor doesn't offer a BAA in its standard tiers. Their Business tier terms should be reviewed directly before approving Cursor for use in PHI-adjacent codebases.

The evaluation question for any IDE assistant is: does the data handling policy allow us to sign a BAA, and does the BAA cover the data Cursor actually sends to its servers (including editor context and code)? If you can't get a clear answer to that question, the tool isn't approved until you can.

Some teams address this by using Cursor in a "no PHI in editor context" configuration, which requires discipline and verification and generally isn't a reliable control. The more durable solution is to use an IDE assistant where the data handling is clear and the BAA is available.

How do I evaluate a new AI tool without a 6-week security review?

A tiered evaluation process avoids the bottleneck without skipping important checks.

Fast track (48–72 hours): For tools that do not process PHI and are used only for developer productivity without PHI-adjacent code context. The review questions: Does the tool send data to a third party? What is the vendor's data retention policy? Is there an enterprise tier with organizational controls? Most developer productivity tools that stay clearly outside PHI scope can be evaluated quickly on these criteria.

Standard track (1–2 weeks): For tools that may have access to code, text, or workflows that could include PHI. This requires reviewing the data handling documentation in detail, confirming BAA availability and scope, and checking whether the tool's data transmission behavior is what the documentation claims.

Full security review (longer): For tools that directly handle PHI, integrate with production systems, or have unclear data handling practices. This is the right process for an AI feature being built into your product; it's not the right process for evaluating whether a developer can use a coding assistant.

Document the paths and make them visible. If developers know there's a 48-hour fast track for low-risk tools, they're less likely to skip the process entirely. If they think every tool request takes six weeks, they'll route around it.

Next steps

Shadow AI is an organizational control problem. Prompt injection is a technical one — and it's most relevant for the patient-facing and document-processing features that your sanctioned infrastructure handles.