HTTP Health Checks Failed
Cause
When your App has one or more HTTP(S) Endpoints, Aptible automatically performs Health Checks during your deploy to make sure your Containers are properly responding to HTTP traffic.
If your containers are not responding to HTTP traffic, the health check fails.
These health checks are called Release Health Checks.
Resolution
There are several reasons why the health check might fail, each with its own fix:
If your app crashes immediately upon start-up, it’s not healthy. In this case, Aptible will indicate that your Containers exited and report their Container Command and exit code.
You’ll need to identify why your Containers are exiting immediately. There are usually two possible causes:
-
There’s a bug, and your container is crashing. If this is the case, it should be obvious from the logs. To proceed, fix the issue and try again.
-
Your container is starting a program that immediately daemonizes. In this case, your container will appear to have exited from Aptible’s perspective. To proceed, make sure the program you’re starting stays in the foreground and does not daemonize, then try again.
App listens on incorrect host
If your app is listening on localhost
(a.k.a 127.0.0.1
), then Aptible cannot connect to it, so the health check won’t pass.
Indeed, your app is running in Containers, so if the app is listening on 127.0.0.1
, then it’s only routable from within those Containers, and notably, it’s not routable from the Endpoint.
To solve this issue, you need to make sure your app is listening on all interfaces. Most application servers let you do so by binding to 0.0.0.0
.
App listens on the incorrect port
If your Containers are listening on a given port, but the Endpoint is trying to connect to a different port, the health check can’t pass.
There are two possible scenarios here:
-
Your Image does not expose the port your app is listening on.
-
Your Image exposes multiple ports, but your Endpoint and your app are using different ports.
In either case, to solve this problem, you should make sure that:
-
The port your app is listening on is exposed by your image. For example, if your app listens on port
8000
, your :ref:Dockerfile
must include the following directive:EXPOSE 8000
. -
Your Endpoint is using the same port as your app. By default, Aptible HTTP(S) Endpoints automatically select the lexicographically lowest port exposed by your image (e.g. if your image exposes port
443
and80
, then the default is443
), but you can select the port Aptible should use when creating the Endpoint and modify it at any time.
App takes too long to come up
It’s possible that your app Containers are is simply taking longer to finish booting up and start accepting traffic than Aptible is willing to wait.
Indeed, by default, Aptible waits for up to 3 minutes for your app to respond. However, you can increase that timeout by setting the RELEASE_HEALTHCHECK_TIMEOUT
Configuration variable on your app.
There is one particular error case worth mentioning here:
Gunicorn and [CRITICAL] WORKER TIMEOUT
When starting a Python app using Gunicorn as your application server, the health check might fail with a repeated set of [CRITICAL] WORKER TIMEOUT
errors.
These errors are generated by Gunicorn when your worker processes fail to boot within Gunicorn’s timeout. When that happens, Gunicorn terminates the worker processes, then starts over.
By default, Gunicorn’s timeout is 30 seconds. This means that if your app needs e.g., 35 seconds to boot, Gunicorn will repeatedly timeout and then restart it from scratch.
As a result, even though Aptible gives you 3 minutes to boot up (configurable with RELEASE_HEALTHCHECK_TIMEOUT
), an app that needs 35 seconds to boot will time out on the Release Health Check because Gunicorn is repeatedly killing then restarting it.
Boot up may take longer than 30 seconds and hitting the timeout is common. Besides, you might have configured the timeout with a lower value (via the --timeout
option).
There are two recommended strategies to address this problem:
-
If you are using a synchronous worker in Gunicorn (the default), use Gunicorn’s
--preload
flag. This option will cause Gunicorn to load your app before starting worker processes. As a result, when the worker processes are started, they don’t need to load your app, and they can immediately start listening for requests instead (which won’t time out). -
If you are using an asynchronous worker in Gunicorn, increase your timeout using Gunicorn’s
--timeout
flag.
📘 If neither of the options listed above satisfies you, you can also reduce your worker count using Gunicorn’s
--workers
flag, or scale up your Container to make more resources available to them.
We don’t recommend these options to address boot-up timeouts because they affect your app beyond the boot-up stage, respectively by reducing the number of available workers and increasing your bill.
That said, you should definitely consider making changes to your worker count or Container size if your app is performing poorly or Metrics are reporting you’re undersized: just don’t do it only for the sake of making the Release Health Check pass.
App is not expecting HTTP traffic
HTTP(S) Endpoints expect your app to be listening for HTTP traffic. If you need to expose an app that’s not expecting HTTP traffic, you shouldn’t be using an HTTP(S) Endpoint.
Instead, you should consider TLS Endpoints and TCP Endpoints.