>

API key management and scope isolation for LLM usage

API key management and scope isolation for LLM usage

By Mat Steinlin, Head of Information Security

Last updated: April 2026

Most LLM security guides spend their key management section on secrets management: don't commit keys to GitHub, rotate them regularly, store them in a vault. That advice is correct and not what this chapter is about.

This chapter is about governance: how you organize API keys as a compliance and security control, not just as a secret to protect. A key that is stored securely but shared across every environment, application, and use case in your organization is still a liability. A developer who leaves the company, a runaway process in a staging environment, or a misconfigured integration can each cause a problem whose blast radius is the entire organization.

Healthcare AI adds a specific constraint. Under HIPAA's audit control standard, you must be able to examine activity in systems that handle PHI. "We had one API key and we don't know which application made which requests" is not a defensible audit position. Key architecture is attribution architecture.

The flat key problem

The default LLM integration starts with a single API key per provider. It works immediately, requires no infrastructure, and creates four problems that compound as your team and usage grow.

Attribution failure

When every system uses the same key, provider usage logs show you one stream of requests with no indication of which application, team, environment, or user generated them. During an audit, this means you can't answer basic questions: which systems were accessing patient data on a given date? Did the production summarization feature make these requests, or the development experiment a junior engineer was running last Tuesday?

45 CFR 164.312(b) requires mechanisms to record and examine PHI-related activity. Attribution isn't optional, and a flat key architecture makes it structurally impossible.

Blast radius

A single key for a single provider means any event affecting that key affects everything that uses it. A compromised key exposes every application's prompts and responses. A soft limit hit in a development experiment blocks production requests. A rate limit triggered by a runaway loop in staging takes down the patient-facing feature that happens to share the same key.

Rotation risk

Key rotation is a standard security practice. With a flat key, rotating means updating every system that uses it simultaneously without breaking production — so in practice teams delay rotation until it's forced on them, or accept significant operational risk when they do it.

Governance gaps

Different use cases have different risk profiles. A production feature summarizing patient records should be restricted to tested, approved models. A development sandbox can experiment freely. An internal tooling integration might have different budget constraints than a customer-facing feature. With one key, none of this is enforceable at the infrastructure level; it depends entirely on discipline.

Scoped key architecture

The alternative is organizing keys around scopes: logical units of access that correspond to a meaningful organizational boundary.

A useful scope hierarchy for a healthcare AI stack:



Each leaf is a scope. Each scope gets its own key, its own model allowlist, and its own budget. A request from the production summarization scope cannot exhaust the budget allocated to development experimentation. A compromised internal tooling key does not expose patient-facing production prompts.

The right level of granularity depends on your team size and risk tolerance. For a five-person company shipping one AI feature, prod/staging/dev separation is sufficient. For a larger organization with multiple AI-integrated products, per-application scopes become necessary to maintain attribution and contain blast radius.

What a scope represents

A scope is a statement of intent. It says: "These requests are from this application, in this environment, for this purpose, and should be governed by these rules." The key is the access credential. The scope is the policy.

This distinction matters for audit readiness. When an auditor asks how PHI was handled during a specific period, you need to be able to say "these 4,200 requests came from the production summarization scope, used GPT-4o under our Anthropic BAA, and were logged with encrypted storage." That answer requires scoped architecture.

Implementation: building scoped keys with OpenAI and Anthropic

Neither OpenAI nor Anthropic offers native application-level scoping within a single account. Both allow multiple API keys, which is the building block. The scoping logic lives in your infrastructure.

Loading and validating scoped keys

Store one key per scope as a separate secret. Environment variables named by scope work well for smaller setups; a secrets manager like AWS Secrets Manager or HashiCorp Vault is worth the operational overhead for teams handling significant PHI volume.

import os
from dataclasses import dataclass, field
from typing import Optional
from openai import OpenAI
import anthropic

SCOPE_CONFIG = {
    # scope_name: (provider, env_var, allowed_models, budget_usd_monthly)
    "prod-summarization": (
        "openai",
        "OPENAI_KEY_PROD_SUMMARIZATION",
        {"gpt-4o", "gpt-4o-2024-11-20"},
        500.00,
    ),
    "prod-extraction": (
        "anthropic",
        "ANTHROPIC_KEY_PROD_EXTRACTION",
        {"claude-sonnet-4-6"},
        200.00,
    ),
    "staging-general": (
        "openai",
        "OPENAI_KEY_STAGING_GENERAL",
        {"gpt-4o", "gpt-4o-mini"},
        50.00,
    ),
    "dev-general": (
        "openai",
        "OPENAI_KEY_DEV_GENERAL",
        None,  # No model restrictions in development
        25.00,
    ),
    "internal-tools": (
        "openai",
        "OPENAI_KEY_INTERNAL",
        {"gpt-4o-mini"},
        10.00,
    ),
}

@dataclass
class ScopeConfig:
    scope: str
    provider: str
    allowed_models: Optional[set]
    budget_usd_monthly: float
    _api_key: str = field(init=False, repr=False)

    def __post_init__(self):
        provider, env_var, allowed_models, budget = SCOPE_CONFIG[self.scope]
        api_key = os.environ.get(env_var)
        if not api_key:
            raise EnvironmentError(
                f"Missing API key for scope '{self.scope}'. "
                f"Expected environment variable: {env_var}"
            )
        self._api_key = api_key

    def get_client(self):
        if self.provider == "openai":
            return OpenAI(api_key=self._api_key)
        elif self.provider == "anthropic":
            return anthropic.Anthropic(api_key=self._api_key)
        raise ValueError(f"Unknown provider: {self.provider}")

def load_scope(scope_name: str) -> ScopeConfig:
    if scope_name not in SCOPE_CONFIG:
        raise ValueError(f"Unknown scope: '{scope_name}'")
    provider, env_var, allowed_models, budget = SCOPE_CONFIG[scope_name]
    return ScopeConfig(
        scope=scope_name,
        provider=provider,
        allowed_models=allowed_models,
        budget_usd_monthly=budget,
    )


The SCOPE_CONFIG dictionary is the single source of truth for what each scope is allowed to do. Adding a new scope means adding an entry here and setting the corresponding environment variable. Removing a scope means removing the entry, and it fails loudly at startup if a scope is referenced that no longer exists.

Tagging requests with scope metadata

Most LLM providers support limited per-request metadata. Anthropic's API accepts a metadata field with a user_id string. OpenAI accepts a user field for abuse detection. Neither provides full scope attribution; that needs to live in your logging layer.

The pattern: pass scope metadata through a wrapper that attaches it to every log entry:

import json
import logging
from datetime import datetime, timezone
from openai import OpenAI

phi_audit_logger = logging.getLogger("llm.phi_audit")
# This logger MUST write to encrypted, access-controlled storage.

def scoped_completion(
    client: OpenAI,
    scope: str,
    environment: str,
    application: str,
    user_id: str,
    model: str,
    messages: list,
    allowed_models: Optional[set] = None,
    **kwargs,
):
    """
    Execute a chat completion with scope attribution and audit logging.

    The 'user' field in the OpenAI request carries scope identity for
    provider-side abuse detection. Full attribution lives in the audit log,
    not in the provider's records — which is why local logging is required
    even when a BAA is in place.
    """
    if allowed_models is not None and model not in allowed_models:
        raise ValueError(
            f"Model '{model}' is not in the allowlist for scope '{scope}'. "
            f"Allowed: {allowed_models}"
        )

    start_time = datetime.now(timezone.utc)
    response = error_type = None
    status = "success"

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            user=f"{scope}:{environment}",  # Provider-side scope signal
            **kwargs,
        )
    except Exception as e:
        status = "error"
        error_type = type(e).__name__
        raise
    finally:
        log_entry = {
            "timestamp":     start_time.isoformat(),
            "user_id":       user_id,
            "scope":         scope,
            "environment":   environment,
            "application":   application,
            "model":         model,
            "messages":      messages,  # PHI — encrypted storage required
            "status":        status,
            "error":         error_type,
            "duration_ms":   int((datetime.now(timezone.utc) - start_time).total_seconds() * 1000),
            "input_tokens":  getattr(getattr(response, "usage", None), "prompt_tokens", None),
            "output_tokens": getattr(getattr(response, "usage", None), "completion_tokens", None),
            "request_id":    getattr(response, "id", None),
        }
        phi_audit_logger.info(json.dumps(log_entry))

    return response


The user field in the OpenAI API call carries a scope identifier to the provider's systems, useful for provider-side abuse detection but not a substitute for your own logging. The detailed attribution (scope, environment, application, user_id) lives in your audit log, which you control and which must meet your retention and encryption requirements.

Model access controls by scope

Model access controls are a governance decision, not just a cost decision. Restricting production to tested, approved models prevents a misconfigured experiment from running against live patient data. It also limits the blast radius of a supply chain event: if a new model version has unexpected behavior, it only affects the scopes you have explicitly allowed it for.

In the flat key architecture, model restrictions live in application code, where they can be bypassed by any developer with access to the codebase. In a scoped architecture enforced by a proxy layer, restrictions are infrastructure-level and can't be bypassed by application code.

A practical model governance framework:

Scope type

Example

Model policy

Production, patient-facing

care-notes-summarization

Allowlist only: tested, versioned model IDs; no aliases like gpt-4o that silently update

Production, internal

internal-reporting

Allowlist: trusted models; can be slightly broader than patient-facing

Staging

staging-general

Allowlist: same as production plus evaluation candidates

Development

dev-general

No restriction: experimentation permitted

Using versioned model IDs in production (e.g., gpt-4o-2024-11-20 rather than gpt-4o) is worth the maintenance overhead. Model aliases silently resolve to new versions when providers update them, and a version bump can change response format, safety filtering behavior, or token usage in ways you didn't test for. Production model changes should be intentional.

The OpenAI model documentation and Anthropic model documentation both list versioned model IDs alongside aliases.

Key rotation without downtime

Key rotation should be a routine operation, not an emergency procedure. The blue/green pattern makes it one:

  1. Generate a new key (KEY_B) from the provider dashboard or API.

  2. Store KEY_B alongside KEY_A in your secrets manager, under a new version.

  3. Deploy an updated configuration that reads KEY_B. At this point both keys are valid; the provider honors both until KEY_A is revoked.

  4. Monitor for a burn-in period (typically 15–30 minutes for production traffic) to confirm KEY_B is operational. Check that requests are succeeding and that logs show the expected scope attribution.

  5. Revoke KEY_A at the provider.

import boto3
import json

def rotate_scope_key(scope: str, new_api_key: str, secret_name: str) -> None:
    """
    Blue/green key rotation for a scope using AWS Secrets Manager.

    The new key is stored alongside the existing key before the old one
    is revoked. Application code reads the current version from Secrets Manager;
    no redeployment is required — only a config refresh.

    Call this function after generating the new key at the provider.
    Revoke the old key at the provider after confirming the new key is live.
    """
    client = boto3.client("secretsmanager")

    # Read current secret state
    current = json.loads(
        client.get_secret_value(SecretId=secret_name)["SecretString"]
    )

    # Stage the new key as "pending" — old key remains active
    current[f"{scope}_pending"] = new_api_key
    client.put_secret_value(
        SecretId=secret_name,
        SecretString=json.dumps(current),
    )

    # After deployment and burn-in, promote and clean up:
    # current[scope] = new_api_key
    # del current[f"{scope}_pending"]
    # client.put_secret_value(SecretId=secret_name, SecretString=json.dumps(current))
    # Then revoke the old key at the provider.

    print(f"Staged new key for scope '{scope}'. "
          f"Deploy, monitor, then promote and revoke the old key.")


The pattern above uses AWS Secrets Manager's versioning to make the new key available before the old one is revoked. Applications that refresh their configuration from Secrets Manager on startup (or via a config polling loop) pick up the new key without redeployment.

For applications that load keys at startup and cache them in memory, you need either a config refresh mechanism or a rolling deployment to avoid a window where some instances are using the old key and some the new. The right choice depends on your deployment model.

One operational note: AWS Secrets Manager charges per secret and per API call. For teams with many scopes, AWS SSM Parameter Store is a lower-cost alternative for non-rotation use cases, with SecureString parameters providing at-rest encryption.

Attribution and audit readiness

Under HIPAA, you must be able to demonstrate which systems accessed PHI and when. With a scoped key architecture and the logging wrapper above, each request carries:

  • Which scope initiated it (scope)

  • Which environment it ran in (environment)

  • Which application it came from (application)

  • Which user or service initiated it (user_id)

  • Which model was used (model)

  • Timestamps and request identifiers

This is enough to answer the questions an auditor or incident investigator will actually ask: "Show me all AI activity involving patient records between March 15 and March 22." "Which system was responsible for the spike in requests on Tuesday at 2am?" "Did the development environment ever make requests using a production-level model?"

Without scoped keys and scope-level logging, these questions either can't be answered or require reconstructing an answer from incomplete provider logs, which often don't have the application-level granularity HIPAA investigations require.

Attribution is also the first thing you need when a security incident happens. The question "which patient records were potentially exposed?" requires knowing which requests went to which model under which BAA. A flat key architecture makes that reconstruction slow, manual, and often incomplete.

Build vs. buy: when to use a proxy layer

The scoped key implementation in this chapter is a reasonable starting point for a small team. It has real limitations as your usage grows.

Provider-native key management gives you isolation but not enforcement. Model access controls live in application code and can be bypassed. Budget controls are limited: OpenAI offers project-level budget alerts, but they're soft limits that notify rather than stop requests; Anthropic's spend limits operate at the workspace level, not per key. Log management is your responsibility end to end. And if you are using multiple providers (OpenAI for some models, Anthropic for others, AWS Bedrock for a third), you are managing parallel key hierarchies with no unified governance layer.

A proxy layer solves these problems by sitting between your application code and the provider. Your application sends a request with a scoped key; the proxy enforces the allowlist, applies the budget check, logs the request and response, and forwards to the appropriate provider. The compliance logic is centralized and infrastructure-level, not distributed across application code.

The open-source option is LiteLLM, which provides unified routing, basic key management, and budget controls across providers. It doesn't provide BAA coverage; you're still responsible for the compliance infrastructure around it. Self-hosting LiteLLM correctly for PHI workloads is non-trivial; it handles routing and you handle everything else.

Portkey offers a managed gateway with HIPAA support on enterprise plans, including BAA signing and PII anonymization. The enterprise requirement means a sales process and pricing negotiation.

Aptible AI Gateway is purpose-built for this use case. Scoped keys, model access controls, cost limits, and audit logging are configured at the infrastructure level rather than in application code. BAA coverage for all traffic through the gateway comes standard, with no enterprise negotiation required. For teams using Aptible for their application infrastructure already, the gateway fits within the existing compliance posture.

Which option is right depends on your team's situation:

Situation

Recommendation

Early-stage, one LLM provider, small team

DIY scoped keys as described in this chapter. Add a proxy layer when you add a second provider or when key governance becomes a recurring maintenance burden.

Multiple LLM providers, HIPAA workloads, no compliance infrastructure

Managed gateway with BAA coverage (Portkey Enterprise or Aptible AI Gateway). The engineering time to build and maintain equivalent infrastructure exceeds the cost.

Existing LiteLLM infrastructure, no PHI in AI calls

Keep LiteLLM. If PHI enters the picture, layer compliant logging and encrypt storage appropriately, or move to a HIPAA-covered gateway.

PHI in AI calls, need audit trail, want minimal operational overhead

Managed gateway. The combination of BAA, logging, and key management in one place is what a managed product provides.

FAQs

How many scopes do I actually need?

Start with three: production, staging, and development. This gives you the most important isolation (production vs. everything else) without the operational overhead of managing dozens of keys. Add per-application scopes when you have multiple distinct AI features with different risk profiles or compliance requirements, or when attribution requirements make the unified production scope too coarse.

Can I use the same key for multiple providers?

No. API keys are provider-specific. But the scope concept is provider-agnostic; a scope named prod-summarization can have one key for OpenAI and one for Anthropic. The application code references the scope; the underlying key is looked up based on which provider is being used for that request.

Does scoped key architecture satisfy HIPAA's access control requirements?

Scope-based key management directly supports 45 CFR 164.312(a)(1), which requires assigning unique identifiers to users and tracking activity. Scoped keys provide system-level attribution; they should be complemented by user-level attribution in your audit logs (the user_id field in the logging patterns above).

Scoped keys alone don't satisfy the full access control requirement: they handle system attribution, not user identity management. Your application's authentication layer handles the latter.

What happens if a key is leaked?

Rotate it using the blue/green pattern above and revoke the old key at the provider. Because the leaked key is scoped, the exposure is limited to that scope's request history, not the organization's entire LLM usage. This is the operational argument for scoped keys: when a key incident happens, the response is contained and well-defined rather than requiring an org-wide key rotation that affects every system simultaneously.

After rotating, audit the scope's request logs for the period the key was potentially exposed. What PHI was in the prompts? Which models were used? Was there any anomalous activity (request volume spikes, unusual models, off-hours requests)? The answer to these questions shapes whether you have a HIPAA reportable incident. If you have good logs, you can answer them.

Should I store keys in environment variables or a secrets manager?

Environment variables work for early-stage teams and are simpler to manage. The limitations: rotation requires a redeployment or restart, secrets are visible in process environments and can leak through diagnostic tools or error messages, and there's no audit trail of who accessed which secret.

A secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) adds rotation without redeployment, fine-grained access control, and an access audit trail. For healthcare applications handling PHI, the audit trail for secret access is worth having. Move to a secrets manager before your first enterprise customer security review.

Next steps

Scoped keys are the foundation, but they only provide attribution value if the audit logs they generate are stored correctly and monitored. The next chapter covers how to build LLM audit logging that satisfies HIPAA and works as a security tool: encrypted storage, anomaly detection, and how to know when your logging has silently stopped working.