Aptible PaaS logoDocs

Division of Responsibility

This document builds on the Division of Responsibility between Aptible and you, focusing on use cases related to Reliability and Disaster Recovery. This is not an exhaustive list, but is instead meant to be examples that help explain how responsibility is divided.

Host-level responsibilities

Aptible

Aptible monitors host health. If a host becomes unhealthy, impacted containers will be moved to a healthy host. This extends to AWS-scheduled hardware maintenance.

You

Aptible is solely responsible for the host. You have no responsibilities related to host management.

Database-level responsibilities

Aptible

  • Aptible restarts containers that have exited (see: Container Recovery).
  • Aptible restarts containers that have run out of memory (see: Memory Management).
  • Aptible monitors for databases containers that are stuck in restart loops, and attempts to resolve the root cause of the restart loop.
    • Common cases include the database running out of disk space, memory, or incorrect/invalid settings. The on call engineer will reach out to the Ops Alert contact with information about the root cause and action taken.
  • Roughly once a day a member of Aptible's SRE team receives a list of Databases using more than 98% of disk space. This notification does not result in the on call engineer being paged, and therefore any action we take is best effort. Action taken is at the discretion of the responding SRE. Most commonly the responding SRE will opt to scale the database and notify the Ops Alert contact, but depending on the usage patterns and growth rate, the responding SRE may instead reach out to the Ops Alert contact before taking any action.
    • Aptible is hoping to automate this process soon. With this automation, any Database exceed 99% disk utilization will be scaled up and the Ops Alert contact will be notified.
  • Aptible ensures that database replicas are distributed across availability zones.
    • There are times that this may not be possible. For example, when recovering a primary or replica after an outage, the fastest path to recovery may be temporarily running both a primary and replica in the same availability zone. In these cases, our SRE team is notified and will reach out to schedule a time to migrate the database to a new availability zone.
  • Aptible automatically takes backups of Databases once a day. Backups are created via point-in-time snapshots of the Database's disk. As a result, taking a backup causes no performance degradation. The resulting backup is not stored on the primary volume.
  • If enabled as part of the retention policy, Aptible copies Database backups to another region as long as another geographically appropriate region is available.

You

  • Aptible does not monitor performance, resource consumption, latency, network connectivity, or any other metrics for Database other than the metrics explicitly outlined above.
  • Aptible does not monitor Database replica health or replication lag.
  • While Aptible can support cross-region replication, Aptible does not proactively run Databases in another region.

App-level responsibilities

Aptible

  • Aptible automatically restarts containers that have exited (see: Container Recovery)..
  • Aptible restarts containers that have run out of memory (see: Memory Management).
  • Aptible monitors App host disk utilization. At 99% utilization, we restart apps on impacted instances in order to free up disk space. Restarted Apps have a fresh filesystem on restart.

You

  • If a container is not correctly designed to exit on failure, Aptible does not restart it, and has no monitoring that will catch that failure condition. You are responsible for ensuring your container correctly exits (see: "Cases where Container Recovery will not work" in Container Recovery).
  • Aptible does not monitor for App containers stuck in restart loops.
  • Aptible does not proactively run your apps in another region, nor do we retain a copy of your code or Docker Images required to be able to fail your Apps over to another region. In the event of a regional outage, Aptible would need to work with you to restore your Apps in a new region.
  • Aptible does not monitor performance, resource consumption, latency, network connectivity, or any other metrics for Apps other than the metrics explicitly outlined above.

VPN-level responsibilities

Aptible

  • Aptible provides connectivity between resource(s) in an Aptible customer's Dedicated Stack and resource(s) in a customer-specified peer network. Aptible is responsible for the configuration and setup of the Aptible VPN peer. (See Site-to-site VPN Tunnels)

You

  • Aptible does not coordinate the configuration of the non-Aptible peer
  • Aptible does not monitor the connectivity between resources across the VPN Tunnel (this is the responsibility of the customer and/or their partner network operator)