HIPAA AI Security
API key management and scope isolation for LLM usage
By Mat Steinlin, Head of Information Security
Last updated: April 2026
Most LLM security guides spend their key management section on secrets management: don't commit keys to GitHub, rotate them regularly, store them in a vault. That advice is correct and not what this chapter is about.
This chapter is about governance: how you organize API keys as a compliance and security control, not just as a secret to protect. A key that is stored securely but shared across every environment, application, and use case in your organization is still a liability. A developer who leaves the company, a runaway process in a staging environment, or a misconfigured integration can each cause a problem whose blast radius is the entire organization.
Healthcare AI adds a specific constraint. Under HIPAA's audit control standard, you must be able to examine activity in systems that handle PHI. "We had one API key and we don't know which application made which requests" is not a defensible audit position. Key architecture is attribution architecture.
The flat key problem
The default LLM integration starts with a single API key per provider. It works immediately, requires no infrastructure, and creates four problems that compound as your team and usage grow.
Attribution failure
When every system uses the same key, provider usage logs show you one stream of requests with no indication of which application, team, environment, or user generated them. During an audit, this means you can't answer basic questions: which systems were accessing patient data on a given date? Did the production summarization feature make these requests, or the development experiment a junior engineer was running last Tuesday?
45 CFR 164.312(b) requires mechanisms to record and examine PHI-related activity. Attribution isn't optional, and a flat key architecture makes it structurally impossible.
Blast radius
A single key for a single provider means any event affecting that key affects everything that uses it. A compromised key exposes every application's prompts and responses. A soft limit hit in a development experiment blocks production requests. A rate limit triggered by a runaway loop in staging takes down the patient-facing feature that happens to share the same key.
Rotation risk
Key rotation is a standard security practice. With a flat key, rotating means updating every system that uses it simultaneously without breaking production — so in practice teams delay rotation until it's forced on them, or accept significant operational risk when they do it.
Governance gaps
Different use cases have different risk profiles. A production feature summarizing patient records should be restricted to tested, approved models. A development sandbox can experiment freely. An internal tooling integration might have different budget constraints than a customer-facing feature. With one key, none of this is enforceable at the infrastructure level; it depends entirely on discipline.
Scoped key architecture
The alternative is organizing keys around scopes: logical units of access that correspond to a meaningful organizational boundary.
A useful scope hierarchy for a healthcare AI stack:
Each leaf is a scope. Each scope gets its own key, its own model allowlist, and its own budget. A request from the production summarization scope cannot exhaust the budget allocated to development experimentation. A compromised internal tooling key does not expose patient-facing production prompts.
The right level of granularity depends on your team size and risk tolerance. For a five-person company shipping one AI feature, prod/staging/dev separation is sufficient. For a larger organization with multiple AI-integrated products, per-application scopes become necessary to maintain attribution and contain blast radius.
What a scope represents
A scope is a statement of intent. It says: "These requests are from this application, in this environment, for this purpose, and should be governed by these rules." The key is the access credential. The scope is the policy.
This distinction matters for audit readiness. When an auditor asks how PHI was handled during a specific period, you need to be able to say "these 4,200 requests came from the production summarization scope, used GPT-4o under our Anthropic BAA, and were logged with encrypted storage." That answer requires scoped architecture.
Implementation: building scoped keys with OpenAI and Anthropic
Neither OpenAI nor Anthropic offers native application-level scoping within a single account. Both allow multiple API keys, which is the building block. The scoping logic lives in your infrastructure.
Loading and validating scoped keys
Store one key per scope as a separate secret. Environment variables named by scope work well for smaller setups; a secrets manager like AWS Secrets Manager or HashiCorp Vault is worth the operational overhead for teams handling significant PHI volume.
The SCOPE_CONFIG dictionary is the single source of truth for what each scope is allowed to do. Adding a new scope means adding an entry here and setting the corresponding environment variable. Removing a scope means removing the entry, and it fails loudly at startup if a scope is referenced that no longer exists.
Tagging requests with scope metadata
Most LLM providers support limited per-request metadata. Anthropic's API accepts a metadata field with a user_id string. OpenAI accepts a user field for abuse detection. Neither provides full scope attribution; that needs to live in your logging layer.
The pattern: pass scope metadata through a wrapper that attaches it to every log entry:
The user field in the OpenAI API call carries a scope identifier to the provider's systems, useful for provider-side abuse detection but not a substitute for your own logging. The detailed attribution (scope, environment, application, user_id) lives in your audit log, which you control and which must meet your retention and encryption requirements.
Model access controls by scope
Model access controls are a governance decision, not just a cost decision. Restricting production to tested, approved models prevents a misconfigured experiment from running against live patient data. It also limits the blast radius of a supply chain event: if a new model version has unexpected behavior, it only affects the scopes you have explicitly allowed it for.
In the flat key architecture, model restrictions live in application code, where they can be bypassed by any developer with access to the codebase. In a scoped architecture enforced by a proxy layer, restrictions are infrastructure-level and can't be bypassed by application code.
A practical model governance framework:
Scope type | Example | Model policy |
|---|---|---|
Production, patient-facing | care-notes-summarization | Allowlist only: tested, versioned model IDs; no aliases like |
Production, internal | internal-reporting | Allowlist: trusted models; can be slightly broader than patient-facing |
Staging | staging-general | Allowlist: same as production plus evaluation candidates |
Development | dev-general | No restriction: experimentation permitted |
Using versioned model IDs in production (e.g., gpt-4o-2024-11-20 rather than gpt-4o) is worth the maintenance overhead. Model aliases silently resolve to new versions when providers update them, and a version bump can change response format, safety filtering behavior, or token usage in ways you didn't test for. Production model changes should be intentional.
The OpenAI model documentation and Anthropic model documentation both list versioned model IDs alongside aliases.
Key rotation without downtime
Key rotation should be a routine operation, not an emergency procedure. The blue/green pattern makes it one:
Generate a new key (KEY_B) from the provider dashboard or API.
Store KEY_B alongside KEY_A in your secrets manager, under a new version.
Deploy an updated configuration that reads KEY_B. At this point both keys are valid; the provider honors both until KEY_A is revoked.
Monitor for a burn-in period (typically 15–30 minutes for production traffic) to confirm KEY_B is operational. Check that requests are succeeding and that logs show the expected scope attribution.
Revoke KEY_A at the provider.
The pattern above uses AWS Secrets Manager's versioning to make the new key available before the old one is revoked. Applications that refresh their configuration from Secrets Manager on startup (or via a config polling loop) pick up the new key without redeployment.
For applications that load keys at startup and cache them in memory, you need either a config refresh mechanism or a rolling deployment to avoid a window where some instances are using the old key and some the new. The right choice depends on your deployment model.
One operational note: AWS Secrets Manager charges per secret and per API call. For teams with many scopes, AWS SSM Parameter Store is a lower-cost alternative for non-rotation use cases, with SecureString parameters providing at-rest encryption.
Attribution and audit readiness
Under HIPAA, you must be able to demonstrate which systems accessed PHI and when. With a scoped key architecture and the logging wrapper above, each request carries:
Which scope initiated it (
scope)Which environment it ran in (
environment)Which application it came from (
application)Which user or service initiated it (
user_id)Which model was used (
model)Timestamps and request identifiers
This is enough to answer the questions an auditor or incident investigator will actually ask: "Show me all AI activity involving patient records between March 15 and March 22." "Which system was responsible for the spike in requests on Tuesday at 2am?" "Did the development environment ever make requests using a production-level model?"
Without scoped keys and scope-level logging, these questions either can't be answered or require reconstructing an answer from incomplete provider logs, which often don't have the application-level granularity HIPAA investigations require.
Attribution is also the first thing you need when a security incident happens. The question "which patient records were potentially exposed?" requires knowing which requests went to which model under which BAA. A flat key architecture makes that reconstruction slow, manual, and often incomplete.
Build vs. buy: when to use a proxy layer
The scoped key implementation in this chapter is a reasonable starting point for a small team. It has real limitations as your usage grows.
Provider-native key management gives you isolation but not enforcement. Model access controls live in application code and can be bypassed. Budget controls are limited: OpenAI offers project-level budget alerts, but they're soft limits that notify rather than stop requests; Anthropic's spend limits operate at the workspace level, not per key. Log management is your responsibility end to end. And if you are using multiple providers (OpenAI for some models, Anthropic for others, AWS Bedrock for a third), you are managing parallel key hierarchies with no unified governance layer.
A proxy layer solves these problems by sitting between your application code and the provider. Your application sends a request with a scoped key; the proxy enforces the allowlist, applies the budget check, logs the request and response, and forwards to the appropriate provider. The compliance logic is centralized and infrastructure-level, not distributed across application code.
The open-source option is LiteLLM, which provides unified routing, basic key management, and budget controls across providers. It doesn't provide BAA coverage; you're still responsible for the compliance infrastructure around it. Self-hosting LiteLLM correctly for PHI workloads is non-trivial; it handles routing and you handle everything else.
Portkey offers a managed gateway with HIPAA support on enterprise plans, including BAA signing and PII anonymization. The enterprise requirement means a sales process and pricing negotiation.
Aptible AI Gateway is purpose-built for this use case. Scoped keys, model access controls, cost limits, and audit logging are configured at the infrastructure level rather than in application code. BAA coverage for all traffic through the gateway comes standard, with no enterprise negotiation required. For teams using Aptible for their application infrastructure already, the gateway fits within the existing compliance posture.
Which option is right depends on your team's situation:
Situation | Recommendation |
|---|---|
Early-stage, one LLM provider, small team | DIY scoped keys as described in this chapter. Add a proxy layer when you add a second provider or when key governance becomes a recurring maintenance burden. |
Multiple LLM providers, HIPAA workloads, no compliance infrastructure | Managed gateway with BAA coverage (Portkey Enterprise or Aptible AI Gateway). The engineering time to build and maintain equivalent infrastructure exceeds the cost. |
Existing LiteLLM infrastructure, no PHI in AI calls | Keep LiteLLM. If PHI enters the picture, layer compliant logging and encrypt storage appropriately, or move to a HIPAA-covered gateway. |
PHI in AI calls, need audit trail, want minimal operational overhead | Managed gateway. The combination of BAA, logging, and key management in one place is what a managed product provides. |
FAQs
How many scopes do I actually need?
Start with three: production, staging, and development. This gives you the most important isolation (production vs. everything else) without the operational overhead of managing dozens of keys. Add per-application scopes when you have multiple distinct AI features with different risk profiles or compliance requirements, or when attribution requirements make the unified production scope too coarse.
Can I use the same key for multiple providers?
No. API keys are provider-specific. But the scope concept is provider-agnostic; a scope named prod-summarization can have one key for OpenAI and one for Anthropic. The application code references the scope; the underlying key is looked up based on which provider is being used for that request.
Does scoped key architecture satisfy HIPAA's access control requirements?
Scope-based key management directly supports 45 CFR 164.312(a)(1), which requires assigning unique identifiers to users and tracking activity. Scoped keys provide system-level attribution; they should be complemented by user-level attribution in your audit logs (the user_id field in the logging patterns above).
Scoped keys alone don't satisfy the full access control requirement: they handle system attribution, not user identity management. Your application's authentication layer handles the latter.
What happens if a key is leaked?
Rotate it using the blue/green pattern above and revoke the old key at the provider. Because the leaked key is scoped, the exposure is limited to that scope's request history, not the organization's entire LLM usage. This is the operational argument for scoped keys: when a key incident happens, the response is contained and well-defined rather than requiring an org-wide key rotation that affects every system simultaneously.
After rotating, audit the scope's request logs for the period the key was potentially exposed. What PHI was in the prompts? Which models were used? Was there any anomalous activity (request volume spikes, unusual models, off-hours requests)? The answer to these questions shapes whether you have a HIPAA reportable incident. If you have good logs, you can answer them.
Should I store keys in environment variables or a secrets manager?
Environment variables work for early-stage teams and are simpler to manage. The limitations: rotation requires a redeployment or restart, secrets are visible in process environments and can leak through diagnostic tools or error messages, and there's no audit trail of who accessed which secret.
A secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) adds rotation without redeployment, fine-grained access control, and an access audit trail. For healthcare applications handling PHI, the audit trail for secret access is worth having. Move to a secrets manager before your first enterprise customer security review.
Next steps
Scoped keys are the foundation, but they only provide attribution value if the audit logs they generate are stored correctly and monitored. The next chapter covers how to build LLM audit logging that satisfies HIPAA and works as a security tool: encrypted storage, anomaly detection, and how to know when your logging has silently stopped working.
Audit logging for healthcare AI: the logging layer that makes scope attribution useful
PHI de-identification as a security control: if your scoped production keys are sending clinical notes, consider whether de-identification makes sense before the request leaves your infrastructure
Shadow AI in healthcare: scoped keys govern your sanctioned infrastructure; this chapter covers what happens when developers route around it