Bob Fairhead
Changelog
Bob Fairhead
September 28, 2020

Improved scheduling and resiliency for application containers

Deploy’s application scheduler can now automatically provision additional host capacity if it detects that placing the application’s containers on existing hosts may overly tax the existing resources or concentrate the containers on a single host or availability zone. The effect of this change will be to improve the performance and reliability of applications at no additional cost to our users.

Previously, our system would attempt to place all containers for application as quickly as possible, avoiding any delays in releasing the new or restarted applications. In some cases, it would then flag the allocation and our Reliability Team would asynchronously review and take action as necessary to ensure the reliability of the applications. In the vast majority of cases during the past few years, this behavior has worked well and delivered the expected results.

However, as we continually improve our hosting platform, we identified this as an area of improvement to avoid the rare edge cases where this asynchronous response may pose an issue. The most frequent situation was a significant scale up in the number or size of containers for a given application. After refining our detection mechanism, Deploy will now automatically be proactive in adding capacity during the scheduling of containers to mitigate the risk of such scaling actions.

The main change in Deploy’s behavior from a customer's perspective will be an occasional delay in completing deployment or scaling operations on applications. The delay should be short, often 10-15 minutes, while new host capacity is brought online. This is similar to the possible delays with provisioning Databases. The operation is proceeding correctly during the delay. If any errors do occur they will be displayed on the dashboard or CLI.