Audit logging for healthcare AI: compliance baseline vs. security operations

Q: Can we store LLM audit logs in the same place as our application logs?

Yes, if your application log storage meets PHI storage requirements: encryption at rest, appropriate access controls, and audit logging of log access. If your general application logs do not meet those standards, logs containing PHI must be stored separately in compliant storage. Not all team members who can access operational logs should be able to access logs containing patient data.

Q: What if a developer needs to see logs for debugging?

Developers should have access to short-term operational logs for their scopes with PHI sanitized from prompts. Full log access including PHI content should require an explicit request that is itself logged. This follows HIPAA's Minimum Necessary standard: access to PHI should be limited to what is required for the specific task.

Q: What happens if our logging fails during a request?

You must decide in advance and document whether to allow or block LLM requests when logging is unavailable. Both approaches are defensible if documented. An undocumented default creates audit risk and potential breach notification complications.

Platform

Solutions

Resources

Docs

Pricing

Get Started

HIPAA AI Security

Table of Contents

Secure AI Stack

Key Management

Audit Logging

PHI De-identification

Shadow AI

Prompt Injection

Agentic AI Security

Data Residency

Guides

Audit logging for healthcare AI: compliance baseline vs. security operations

By Mat Steinlin, Head of Information Security

Last updated: April 2026

One company we worked with had diligently logged every LLM interaction for months. When a customer security review asked for records covering a three-week period from the prior quarter, they couldn't produce them. Their logging infrastructure had a silent failure (a misconfigured log drain, unnoticed), and the logs from that window didn't exist. The fix was straightforward. The disclosure to their compliance team was not.

This is the thing about audit logging that most teams get wrong: they build logging and assume it's working. They treat it as a one-time infrastructure task, not an ongoing operational concern. And they implement what HIPAA requires without thinking about what makes logs useful when something actually goes wrong.

This chapter covers both dimensions. The compliance requirements are the floor; the security operations layer is what makes logging worth having.

For HIPAA retention requirements, retention periods, and what auditors check, see Audit log retention. For the basic compliance logging implementation, see HIPAA-Compliant AI. This chapter builds on those foundations.

What HIPAA requires vs. what security requires

45 CFR 164.312(b) requires "hardware, software, and/or procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information." For LLM interactions, that means logging activity involving PHI: the prompts, the responses, timestamps, user attribution, and which model was used.

That is the compliance minimum. HIPAA does not specify log format, anomaly detection, alerting thresholds, log analysis tooling, or what to do when logging fails. Those are your decisions.

Security logging goes further than compliance logging in two directions. First, it captures metadata beyond the HIPAA-required fields: operational signals that are not required for a compliance audit but are necessary for detecting and investigating incidents. Second, it treats the logging infrastructure itself as a system that needs monitoring, not just a passive recorder.

The practical difference: a team with compliance-only logging can satisfy an auditor asking "did this request occur?" They often can't answer "why did API costs spike 300% overnight?" or "which system was responsible for the anomalous request volume on March 15?" Security logging closes that gap.

What to log for LLM interactions

The compliance minimum

The HIPAA-required fields for LLM audit logging are covered in HIPAA-Compliant AI: prompt content, response, timestamp, user or system attribution, and model used. These fields establish an auditable record that PHI was handled.

Security metadata worth adding

The fields below are not required by HIPAA. They are required for making logs actionable when something goes wrong.

Request duration (duration_ms): Unusually long LLM requests are an anomaly signal. A request that normally takes 800ms taking 45 seconds can indicate prompt injection causing the model to generate an unexpectedly large response, a misconfigured retry policy, or upstream rate limiting that your application is silently retrying. Without duration data, you can't distinguish these cases.

Token counts (input_tokens, output_tokens): Token counts are both a cost signal and an abuse signal. A sudden spike in average input tokens suggests someone is sending unusually large context, which could be a bug, an adversarial prompt embedding a large payload, or a developer testing with production-scale data in a development environment. Token count anomalies are often the first signal of a problem, appearing before cost alerts trigger.

Response status (status, finish_reason): Track whether each request succeeded, was rate-limited, was refused by the model's safety filters, or errored. A sudden increase in safety filter refusals on a particular scope can indicate prompt injection probing: someone testing inputs systematically to find what the model will respond to.

Scope and key attribution (scope, key_id): Covered in depth in API key management. Without scope attribution in your logs, you cannot answer "which system made these requests" during an investigation. These fields are required for the logs to be useful for anything beyond basic compliance.

De-identification flag (deidentified): A boolean indicating whether PHI de-identification was applied before the request was sent. This becomes relevant during an incident investigation: if de-identification was active, the prompts in your logs contain tokens rather than raw PHI, which changes the scope of your breach notification analysis.

Source identifier (source_ip, service_name): The application or service that initiated the request, and where it came from. For internal tools and service-to-service calls, a service name is more useful than an IP. For patient-facing features where requests originate from end users, the application name plus user attribution is typically sufficient.

The full logging structure:

import json
import logging
import uuid
from contextlib import contextmanager
from datetime import datetime, timezone
from typing import Optional

# PHI audit logger — must route to encrypted, access-controlled storage.
# Do not share this logger's handlers with general application logging.
phi_audit_logger = logging.getLogger("llm.phi_audit")

def log_llm_request(
    *,
    user_id: str,
    scope: str,
    environment: str,
    application: str,
    model: str,
    messages: list,
    response=None,
    status: str,
    duration_ms: int,
    error: Optional[str] = None,
    deidentified: bool = False,
    source: Optional[str] = None,
) -> str:
    """
    Write a structured audit log entry for an LLM interaction.

    Returns the request_id so it can be included in application logs
    (without PHI) for correlation during investigations.
    """
    request_id = str(uuid.uuid4())
    usage = getattr(getattr(response, "usage", None), "__dict__", {}) if response else {}

    entry = {
        # HIPAA-required fields
        "request_id":    request_id,
        "timestamp":     datetime.now(timezone.utc).isoformat(),
        "user_id":       user_id,
        "model":         model,
        "messages":      messages,        # Contains PHI — encrypted storage required
        "response":      getattr(response, "choices", [{}])[0]
                         .__dict__.get("message", {})
                         .__dict__.get("content") if response else None,
        # Security metadata
        "scope":         scope,
        "environment":   environment,
        "application":   application,
        "status":        status,
        "duration_ms":   duration_ms,
        "error":         error,
        "finish_reason": getattr(
                         getattr(response, "choices", [None])[0], "finish_reason", None
                         ) if response else None,
        "input_tokens":  usage.get("prompt_tokens"),
        "output_tokens": usage.get("completion_tokens"),
        "deidentified":  deidentified,
        "source":        source,
    }

    phi_audit_logger.info(json.dumps(entry))
    return request_id

One note on the request_id return value: your general application logs should include this identifier (without any PHI) so that during an investigation you can correlate a specific event in your application timeline to the corresponding PHI-containing audit log entry. This keeps PHI out of general logs while maintaining traceability.

Log storage: encryption and access control

Logs contain PHI. This is where many teams create a compliance gap without realizing it: they build diligent logging, then store logs in plaintext S3, an unencrypted Elasticsearch cluster, or a shared log aggregator that doesn't have appropriate access controls. The audit trail exists, but the storage fails the Security Rule's encryption and access control requirements.

Two requirements apply:

Encryption at rest. All PHI must be encrypted at rest under 45 CFR 164.312(a)(2)(iv). For S3, this means server-side encryption. AWS S3 provides three options; for HIPAA workloads, SSE-KMS using a customer-managed key is the right choice: it encrypts data using a key you control in AWS KMS, and access to the KMS key can be audited independently.

Access controls. PHI audit logs shouldn't be accessible to everyone who has access to your general application infrastructure. S3 bucket policies, IAM roles, and KMS key policies should collectively restrict log access to the systems and individuals with a legitimate operational need.

Configuring a HIPAA-appropriate S3 log bucket in Python using boto3:

import boto3
import json

def create_phi_log_bucket(
    bucket_name: str,
    kms_key_arn: str,
    log_reader_role_arn: str,
    region: str = "us-east-1",
) -> None:
    """
    Create an S3 bucket configured for PHI audit log storage.

    Enforces:
    - SSE-KMS encryption with a customer-managed key
    - Block all public access
    - Bucket policy restricting access to a designated log-reader IAM role
    - Versioning enabled (supports retention policy enforcement and deletion protection)
    """
    s3 = boto3.client("s3", region_name=region)

    s3.create_bucket(
        Bucket=bucket_name,
        CreateBucketConfiguration={"LocationConstraint": region},
    )

    # Block all public access
    s3.put_public_access_block(
        Bucket=bucket_name,
        PublicAccessBlockConfiguration={
            "BlockPublicAcls": True,
            "IgnorePublicAcls": True,
            "BlockPublicPolicy": True,
            "RestrictPublicBuckets": True,
        },
    )

    # Enforce SSE-KMS encryption for all objects
    s3.put_bucket_encryption(
        Bucket=bucket_name,
        ServerSideEncryptionConfiguration={
            "Rules": [{
                "ApplyServerSideEncryptionByDefault": {
                    "SSEAlgorithm": "aws:kms",
                    "KMSMasterKeyID": kms_key_arn,
                },
                "BucketKeyEnabled": True,  # Reduces KMS API costs
            }]
        },
    )

    # Enable versioning
    s3.put_bucket_versioning(
        Bucket=bucket_name,
        VersioningConfiguration={"Status": "Enabled"},
    )

    # Restrict access to the designated log-reader role + log-writer (this function's caller)
    bucket_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "DenyUnencryptedUploads",
                "Effect": "Deny",
                "Principal": "*",
                "Action": "s3:PutObject",
                "Resource": f"arn:aws:s3:::{bucket_name}/*",
                "Condition": {
                    "StringNotEquals": {
                        "s3:x-amz-server-side-encryption": "aws:kms"
                    }
                },
            },
            {
                "Sid": "AllowLogReaderRole",
                "Effect": "Allow",
                "Principal": {"AWS": log_reader_role_arn},
                "Action": ["s3:GetObject", "s3:ListBucket"],
                "Resource": [
                    f"arn:aws:s3:::{bucket_name}",
                    f"arn:aws:s3:::{bucket_name}/*",
                ],
            },
        ],
    }
    s3.put_bucket_policy(
        Bucket=bucket_name,
        Policy=json.dumps(bucket_policy),
    )

For log ingestion, a Python logging handler that writes directly to S3:

import boto3
import gzip
import io
import logging
from datetime import datetime, timezone

class S3PHILogHandler(logging.Handler):
    """
    Logging handler that writes PHI audit logs to an encrypted S3 bucket.

    Each log entry is uploaded as an individual gzip-compressed object.
    For production, batch uploads via Kinesis Firehose or a similar
    pipeline reduce per-object overhead significantly.
    """
    def __init__(self, bucket_name: str, kms_key_arn: str, prefix: str = "llm-audit/"):
        super().__init__()
        self.bucket_name = bucket_name
        self.kms_key_arn = kms_key_arn
        self.prefix = prefix
        self.s3 = boto3.client("s3")

    def emit(self, record: logging.LogRecord) -> None:
        try:
            timestamp = datetime.now(timezone.utc)
            key = (
                f"{self.prefix}"
                f"{timestamp.strftime('%Y/%m/%d/%H')}/"
                f"{timestamp.isoformat()}-{record.getMessage()[:16]}.json.gz"
            )
            compressed = gzip.compress(self.format(record).encode("utf-8"))
            self.s3.put_object(
                Bucket=self.bucket_name,
                Key=key,
                Body=compressed,
                ContentEncoding="gzip",
                ContentType="application/json",
                ServerSideEncryption="aws:kms",
                SSEKMSKeyId=self.kms_key_arn,
            )
        except Exception:
            self.handleError(record)

For AWS CloudWatch Logs, encryption with a customer-managed KMS key can be enabled on the log group directly. CloudWatch is better for operational access (fast query with CloudWatch Insights); S3 is better for long-term retention at scale.

Short-term vs. long-term retention

LLM audit logs serve two distinct purposes with different access patterns and retention requirements. Conflating them in a single storage tier creates either a cost problem (retaining six years of data in a fast-access store) or an operational problem (investigators cannot quickly query logs that are in cold storage).

Short-term operational logs (7–30 days): Fast access for debugging, configuration verification, and prompt optimization. Developers legitimately need to see recent requests to understand model behavior, verify de-identification is working, or diagnose unexpected outputs. CloudWatch Logs or a similar queryable store works well here. Access controls should still be strict (these logs contain PHI), but the access pattern is more frequent.

Long-term compliance logs (6+ years per HIPAA; verify your retention policy): Archival access for audits and breach investigations. The access pattern is infrequent and typically batch: "retrieve all requests from scope X between dates Y and Z." S3 with Glacier tiering is cost-effective. Fast query is less important than durability, encryption, and assured retention.

The routing pattern: your logging infrastructure writes to both tiers. Short-term logs are queryable immediately. Long-term logs are archived via a drain from the short-term store or written directly to S3.

For full retention requirements, including what HIPAA specifies vs. what individual state laws may require, see Audit log retention.

Using logs as an anomaly detection signal

Most teams build logging for audits and investigations: things that happen after a problem is discovered. The same logs can surface problems before they become incidents, if you define what anomalous looks like.

Five signals worth monitoring in LLM audit logs:

Volume spikes on a single scope. A production scope that normally handles 200 requests per hour suddenly handling 8,000 is either a runaway loop, an adversarial usage pattern, or a misconfigured client retrying aggressively. Any of these is worth alerting on. The threshold depends on your baseline; start by alerting when any scope exceeds 5x its rolling hourly average.

Unexpected models in production logs. If your production allowlist covers three specific models and your logs show a fourth, something bypassed your controls. This either means a code change that wasn't reviewed or an infrastructure misconfiguration. Either way, it should not appear silently in a compliance log two weeks later.

Sudden increase in safety refusals. A spike in finish_reason: content_filter responses on a patient-facing feature can indicate systematic prompt injection probing: someone sending crafted inputs to find what the model will respond to. One or two refusals per day is normal. A hundred in an hour is not.

Off-hours access by production scopes. Patient-facing features that have no legitimate overnight traffic showing LLM activity at 3am is worth investigation. This signal generates false positives for global teams and background jobs, so calibrate carefully, but for clearly bounded use cases it is a reliable anomaly indicator.

Token count outliers. Individual requests with input token counts far above your application's normal distribution can indicate unusually large prompts being injected or a developer inadvertently sending a full document as context.

A basic threshold alerting implementation using a sliding window:

import json
import time
from collections import defaultdict, deque
from datetime import datetime, timezone
from typing import Callable

class LLMLogAnomalyDetector:
    """
    Sliding-window anomaly detector for LLM audit log events.

    Monitors volume per scope, model allowlist violations, and
    safety refusal rates. Calls alert_fn when a threshold is exceeded.

    For production use, replace the in-memory deques with a time-series
    store (Redis, CloudWatch Metrics, Datadog) that persists across restarts.
    """

    def __init__(
        self,
        alert_fn: Callable[[str, dict], None],
        volume_window_seconds: int = 3600,
        volume_spike_multiplier: float = 5.0,
        allowed_models_by_scope: dict = None,
        refusal_threshold_per_hour: int = 20,
    ):
        self.alert_fn = alert_fn
        self.volume_window = volume_window_seconds
        self.spike_multiplier = volume_spike_multiplier
        self.allowed_models = allowed_models_by_scope or {}
        self.refusal_threshold = refusal_threshold_per_hour

        # scope -> deque of request timestamps
        self._request_times: dict[str, deque] = defaultdict(deque)
        # scope -> deque of refusal timestamps
        self._refusal_times: dict[str, deque] = defaultdict(deque)
        # scope -> rolling hourly baseline (simple moving average)
        self._baseline: dict[str, float] = {}

    def process_log_entry(self, entry: dict) -> None:
        now = time.time()
        scope = entry.get("scope", "unknown")
        model = entry.get("model")
        finish_reason = entry.get("finish_reason")

        # Prune expired entries
        cutoff = now - self.volume_window
        while self._request_times[scope] and self._request_times[scope][0] < cutoff:
            self._request_times[scope].popleft()
        while self._refusal_times[scope] and self._refusal_times[scope][0] < cutoff:
            self._refusal_times[scope].popleft()

        # Record this request
        self._request_times[scope].append(now)
        current_volume = len(self._request_times[scope])

        # Volume spike check
        baseline = self._baseline.get(scope, current_volume)
        if current_volume > baseline * self.spike_multiplier and current_volume > 50:
            self.alert_fn("volume_spike", {
                "scope": scope,
                "current_count": current_volume,
                "baseline": baseline,
                "window_seconds": self.volume_window,
            })
        # Update baseline with exponential moving average
        self._baseline[scope] = 0.9 * baseline + 0.1 * current_volume

        # Model allowlist violation check
        allowed = self.allowed_models.get(scope)
        if allowed and model and model not in allowed:
            self.alert_fn("model_allowlist_violation", {
                "scope": scope,
                "model": model,
                "allowed_models": list(allowed),
                "timestamp": entry.get("timestamp"),
            })

        # Safety refusal rate check
        if finish_reason == "content_filter":
            self._refusal_times[scope].append(now)
            refusal_count = len(self._refusal_times[scope])
            if refusal_count >= self.refusal_threshold:
                self.alert_fn("refusal_rate_spike", {
                    "scope": scope,
                    "refusal_count": refusal_count,
                    "window_seconds": self.volume_window,
                })

def send_alert(alert_type: str, context: dict) -> None:
    # Replace with your alerting integration: PagerDuty, SNS, Slack webhook, etc.
    print(f"ALERT [{alert_type}]: {json.dumps(context)}")

detector = LLMLogAnomalyDetector(
    alert_fn=send_alert,
    allowed_models_by_scope={
        "prod-summarization": {"gpt-4o", "gpt-4o-2024-11-20"},
        "prod-extraction": {"claude-sonnet-4-6"},
    },
)

These checks are intentionally simple. The goal is detecting obvious problems fast, not building a machine learning anomaly detection pipeline. Start with volume spikes and model violations; add refusal rate monitoring once you have a baseline for what normal looks like on your specific features.

The silent failure problem

The incident described at the start of this chapter (three weeks of missing logs discovered during a security review) isn't unusual. Logging infrastructure fails silently more often than it fails loudly: the application keeps running, requests keep succeeding, and the only indication something is wrong is the absence of records, which you only notice when you need them.

Silent failure modes in LLM logging:

A logging handler throws an exception on write, catches it silently, and drops the record
An S3 upload fails due to a permissions change after a rotation and the error is swallowed
A disk fills up on a log aggregator host and new records are silently dropped
A configuration deployment updates the log destination without updating the encryption key reference, causing writes to fail
A library update changes a field name in the response object the logging wrapper depends on, causing null values to be logged without an error

The OWASP Logging Cheat Sheet recommends treating logging failures as significant events, not background noise. For PHI audit logging, a logging failure is a compliance event.

How to verify your logging is working

The verification approach: write a synthetic test request on a schedule, then confirm the corresponding log entry appears in your storage backend. If it does not appear within a defined window, alert.

import boto3
import json
import time
from datetime import datetime, timezone, timedelta

def verify_phi_logging_pipeline(
    bucket_name: str,
    expected_prefix: str,
    test_scope: str = "logging-healthcheck",
    max_wait_seconds: int = 60,
) -> bool:
    """
    Verify that the PHI audit logging pipeline is writing to S3.

    Sends a canary log entry with a unique marker, then polls S3 to confirm
    the entry was written within the expected window.

    Run this on a schedule (every 15–30 minutes) and alert on failure.
    Do not use a real user_id or real PHI in the canary entry.
    """
    import logging
    from your_logging_module import log_llm_request  # your implementation

    canary_id = f"canary-{datetime.now(timezone.utc).isoformat()}"
    canary_user = "logging-healthcheck-system"

    # Write canary entry
    log_llm_request(
        user_id=canary_user,
        scope=test_scope,
        environment="healthcheck",
        application="logging-verifier",
        model="healthcheck",
        messages=[{"role": "user", "content": f"[CANARY:{canary_id}]"}],
        status="healthcheck",
        duration_ms=0,
        deidentified=False,
    )

    # Poll for confirmation
    s3 = boto3.client("s3")
    deadline = time.time() + max_wait_seconds
    prefix = f"{expected_prefix}{datetime.now(timezone.utc).strftime('%Y/%m/%d/%H')}/"

    while time.time() < deadline:
        time.sleep(5)
        response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
        for obj in response.get("Contents", []):
            body = s3.get_object(Bucket=bucket_name, Key=obj["Key"])["Body"].read()
            if canary_id.encode() in body:
                return True

    return False  # Trigger alert: logging pipeline is not writing

# Schedule this to run every 15 minutes.
# If it returns False, send an alert via PagerDuty, SNS, or your on-call system.

This canary approach catches the failure modes that silently drop records. It doesn't catch misconfigured encryption (the record writes but may not be encrypted correctly); verify encryption configuration separately when making infrastructure changes.

How to alert on logging failures

At minimum: alert on the canary check failure. For more comprehensive coverage:

Alert if the logging handler's error count exceeds zero in a rolling window (configure your log aggregator to track handler exceptions)
Alert if the volume of audit log entries falls below the expected minimum for production scopes during business hours
Alert on S3 write errors from your logging handler (surface these explicitly rather than swallowing them)

Log drain setup and long-term storage options

A log drain exports log data from your short-term store to long-term archival. This is a distinct operation from the initial log write. Your application writes to a fast-access store; the drain moves records to archival storage automatically.

S3 via Amazon Kinesis Data Firehose

For teams on AWS, Kinesis Data Firehose is the standard pattern for buffered S3 delivery with encryption:

import boto3
import json
import base64

def configure_firehose_drain(
    delivery_stream_name: str,
    bucket_arn: str,
    kms_key_arn: str,
    iam_role_arn: str,
) -> dict:
    """
    Create a Kinesis Data Firehose delivery stream that drains LLM audit logs
    to an encrypted S3 bucket. Firehose handles buffering, batching, and retry.

    The S3 bucket should be the same PHI log bucket created in create_phi_log_bucket().
    """
    firehose = boto3.client("firehose")

    return firehose.create_delivery_stream(
        DeliveryStreamName=delivery_stream_name,
        DeliveryStreamType="DirectPut",
        ExtendedS3DestinationConfiguration={
            "RoleARN": iam_role_arn,
            "BucketARN": bucket_arn,
            "Prefix": "llm-audit/!{partitionKeyFromQuery:year}/!{partitionKeyFromQuery:month}/",
            "ErrorOutputPrefix": "llm-audit-errors/",
            "BufferingHints": {
                "SizeInMBs": 5,
                "IntervalInSeconds": 300,  # 5 minutes max buffering
            },
            "CompressionFormat": "GZIP",
            "EncryptionConfiguration": {
                "KMSEncryptionConfig": {"AWSKMSKeyARN": kms_key_arn}
            },
            "DynamicPartitioningConfiguration": {"Enabled": True},
        },
    )

Langfuse

Langfuse is an open-source LLM observability platform that provides a queryable interface over your audit log data, useful for prompt debugging, cost analysis, and short-term operational access. It is not a long-term compliance archive on its own; combine it with S3 for retention.

Langfuse uses OpenTelemetry for log ingestion, which means you can configure it as a drain destination without changing your logging implementation, as long as your logs emit in OTEL format. Beta support for Langfuse as a drain destination is available in Aptible LLM Gateway.

SIEM integration

For larger teams with an existing SIEM (Splunk, Sumo Logic, Elastic SIEM), routing LLM audit logs into the SIEM provides unified visibility across your security data. The primary consideration is cost: LLM logs, especially prompt and response content, have high per-record size. Filter carefully to avoid ingesting more than you need for security monitoring; route full logs to S3 and send a reduced field set to the SIEM.

FAQs

Do I need to log every LLM request or only the ones that involve PHI?

Audit logs serve two compliance functions. The first is the audit control standard under 45 CFR 164.312(b), which requires recording and examining activity in systems that contain or use PHI. The second is the Workforce Access Management requirement under 45 CFR 164.312(a): logs are how you demonstrate that only authorized personnel accessed PHI. If a request does not involve PHI (a purely synthetic or test request with no patient data), logging it is not strictly required under either standard.

In practice, filtering reliably by PHI presence at log time is difficult. A model summarizing patient notes might receive a prompt that includes a session ID but no direct PHI; the response, however, contains PHI-derived content. Log everything that flows through your PHI-handling systems. The storage cost is low relative to the investigation and compliance risk of a logging gap.

How long do we need to keep LLM audit logs?

HIPAA requires documentation to be retained for six years from creation or last effective date. For audit logs, that means six years from when the log was created. Some states impose longer retention requirements; verify applicable state law for your patient population. For the full retention framework, see Audit log retention.

Can we store LLM audit logs in the same place as our application logs?

Not if your application logs aren't encrypted at rest and appropriately access-controlled. LLM audit logs contain PHI. They need encryption and access controls appropriate for PHI. If your general application log store already meets those standards, co-location is technically acceptable, but the access controls must treat the PHI audit logs as a restricted subset; not everyone with access to application logs should have access to logs containing patient data.

What if a developer needs to see logs for debugging?

Give developers access to short-term operational logs for scopes they own, with PHI fields visible only to authorized roles. In practice: developers can see request metadata (scope, model, duration, status, token counts) and a sanitized version of the prompt, with full PHI access gated behind an explicit access request that is itself logged. Limiting PHI access to what is necessary for a legitimate purpose is not just a best practice: it’s required under the HIPAA Privacy Rule's Minimum Necessary standard (45 CFR 164.502(b)). Gating full log PHI access behind an explicit, logged request is a direct implementation of that requirement.

The de-identification flag in the logging schema above is relevant here. If de-identification was applied, the prompt in the log contains tokens rather than PHI; a developer can debug the model interaction without seeing raw patient data. This is one operational argument for implementing de-identification even when a BAA is in place.

What happens if our logging fails during a request?

Decide in advance whether to allow or block requests when logging is unavailable, and document that decision. The right choice depends on your use case: for patient-facing features where clinical workflows depend on a timely response, blocking requests because logging is unavailable may cause more harm than completing the request with an alerting gap. For internal tooling with no immediate clinical dependency, blocking requests when logging is unavailable is the more conservative choice.

More consequential than the allow-or-block decision is what a logging gap means after the fact. A gap discovered during a security review or audit may trigger breach notification obligations under the HIPAA Breach Notification Rule (45 CFR Part 164 Subpart D). The Rule requires notification when unauthorized PHI access cannot be ruled out — and a missing log record cannot establish that PHI was not improperly accessed during that window. The four-factor risk assessment still applies, but without logs, the analysis defaults toward notification.

Whatever your allow-or-block policy, make it explicit and document it. An undocumented policy that defaults to allowing requests is not a defensible audit position. "We had a documented policy to allow requests when logging was unavailable for availability reasons, with active alerting on logging failures to detect and close gaps promptly" is.

Next steps

Logging provides the audit trail. De-identification reduces what's in that trail, limiting PHI exposure at the provider and reducing blast radius if your log storage is ever compromised. The two controls work together.

PHI de-identification as a security control: how to reduce what PHI reaches the model and ends up in your logs
API key management and scope isolation: the scope attribution that makes logs useful for investigation
Shadow AI in healthcare: what happens outside your sanctioned logging infrastructure

548 Market St #75826 San Francisco, CA 94104

HIPAA AI Security

What HIPAA requires vs. what security requires

What to log for LLM interactions

The compliance minimum

Security metadata worth adding

Log storage: encryption and access control

Short-term vs. long-term retention

Using logs as an anomaly detection signal

The silent failure problem

How to verify your logging is working

How to alert on logging failures

Log drain setup and long-term storage options

S3 via Amazon Kinesis Data Firehose

Langfuse

SIEM integration

FAQs

Next steps

Platform

solutions

Resources

Guides

Company

Platform

solutions

Resources

Guides

Company

Platform

solutions

Resources

Guides

Company