Kubernetes Challenges: Container Orchestration and Scaling
Kubernetes (K8s) often catches users off guard with its complexity. Harnessing its full potential requires paying attention to detail and keenly understanding its core concepts. If you don’t, K8s can fall apart quicker than a wet paper straw.
J.R.R. Tolkien once said, “It does not do to leave a live dragon out of your calculations if you live near one.” Similarly, while Kubernetes offers enormous power and flexibility for container orchestration, you’ll find that even the most experienced professionals have trouble navigating its configuration and services.
Platform practitioners that manage kubernetes-based platforms often experience significant frustration as they encounter a multitude of challenges, ranging from intricate configuration requirements and unpredictable expenses to limitations in scalability, concerns surrounding secrets management, and a steep learning curve.
Kubernetes is an efficient platform for managing containerized applications. Dev teams use it to quickly deploy and orchestrate applications across a cluster of machines, automating many tasks that would otherwise be time-consuming and error-prone. For example, developers can deploy and manage containerized applications, automating tasks like load balancing, scaling, and self-healing, which streamlines application deployment, ensures high availability, and simplifies infrastructure management.
A critical aspect of its orchestration capabilities is dynamic scaling, which efficiently allocates resources and seamlessly handles fluctuations in workload demand. Instead of manually monitoring usage and bandwidth, k8s manages it nearly autonomously.
Magical as k8s may seem at first, there are plenty of challenges to overcome to be production-ready.
Understanding these challenges is essential for anyone who wants to deploy Kubernetes in production. Kubernetes is a complex system, and there are many things that can go wrong if it is not properly configured. Here are some of the most frustrating challenges that Kubernetes users face:
Configuring Components is Complex
In a recent survey conducted by Civo, 54% of cloud developers revealed that Kubernetes complexity is slowing down their organization's use of containers.
One of the main sources of this complexity is writing and managing YAML/JSON manifests. Manifests are configuration files that outline how to use resources within a cluster. Writing them is time-consuming, prone to misinterpretation, and immensely frustrating. There are so many object types in K8s, each with its own specific requirements and specifications, users may find themselves overwhelmed when working with manifests.
Object types represent different resources, entities, or components within the Kubernetes system, such as pods, services, deployments, and config maps. A piece of code manages each of these object types, and the life cycles of all these objects work in tandem to orchestrate the system as a whole.
Coordinating all these components requires esoteric knowledge. Depending on the application, you may need to specify additional parameters for certain object types, especially for a StatefulSet (the workload API object used to manage stateful applications). None of this is well-documented—and even if it was, the docs assume a high degree of k8s knowledge to begin with.
Working with a StatefulSet is ideal for managing stateful applications with unique network identities, persistent storage, and ordered deployment and scaling. StatefulSets sometimes require many additional parameters. These parameters, tailored to your application's requirements, typically include:
- A Persistent Volume Claim (PVC) to ensure persistent storage
- A Pod Disruption Budget (PDB) to maintain availability during maintenance
- Init Containers to execute custom initialization tasks
However, simply configuring an application is not merely a “one and done” task; it typically needs a dedicated DevOps team willing to regularly scan Kubernetes clusters and ensure their proper configuration. This process involves validating pod resource limits and security policies to ensure a smooth operation. Kubernetes administrators also need to evaluate, select, install, and manage myriad third-party plug-ins or extensions from a vast and dizzying array of options.
Depending on the application, K8s administrators may need to install Ingress controllers such as:
- Nginx for load balancing
- Service mesh plugins such as Linkerd for advanced traffic management
- Monitoring plugins like Prometheus to identify performance bottlenecks
- Storage solutions like Rook for dynamic provisioning
- Security-focused plugins like Falco
- Authentication plugins like Dex for robust authentication and authorization mechanisms
With Kubernetes, complexity begets more complexity. As you add more clusters and components, the configurations get more intricate, thanks to the expanded requirements and the challenge of keeping everything consistent in a distributed system. Each new Kubernetes cluster has its own set of resource needs, including storage strategies, networking protocols, and computer nodes.
As Kubernetes scales, the complexity of maintaining consistent networking policies across multiple pods intensifies. An illustrative challenge in this context is when a network policy selects a specific pod, as it necessitates explicit matching of traffic to and from that pod with a network policy rule to avoid blocking. While there are no inherent restrictions on Pod-to-Pod communication within a Kubernetes namespace, administrators must carefully consider network policy rules to ensure uninterrupted traffic flow and adequate network security.
In addition, the dynamic nature of the Kubernetes ecosystem introduces more complexity. With regular updates, the introduction of new features, and community-driven enhancements, staying current with best practices and emerging trends becomes essential for effectively navigating the evolving Kubernetes landscape.
As Capital One's cloud guru Bernard Golden once stated, "While Kubernetes-based applications may be easy to run, Kubernetes itself is no picnic to operate."
Expensive or Unpredictable Costs
The biggest problem for K8s is uncontrollable costs. In a recent Cloud Native Computing Foundation's (CNCF) FinOps for Kubernetes survey, 68% of respondents said Kubernetes costs are increasing, with half experiencing an increase of more than 20% per year.
Overprovisioning resources occurs when an enterprise fails to carefully monitor spending and loses control over the costs involved. In the CNCF survey, 24% of respondents did not monitor Kubernetes spending at all, while 44% relied on monthly estimates. Only a relative minority employed predictive Kubernetes cost monitoring processes.
When configured correctly, applications are responsive, even under heavy loads and traffic spikes. Improper configuration of K8s can lead to excessive scaling of an application, resulting in over-provisioning.
Over-allocated resources affect performance by wasting and hoarding shared cluster resources. When deployments request more resources, like CPU and memory, than they actually need, it can cause application performance to degrade. A web application configured to run hundreds of instances may overwhelm available resources, resulting in slow response times, increased latency, and even system crashes well beyond the problematic container.
Dynamic scaling => dynamic costs
Since these environments are dynamic, K8s engineers can unintentionally overprovision their applications if they fail to account for fluctuations in usage. For example, a K8s engineer might configure a web application deployment to use 16 CPU cores and 64GB of memory, even though the actual usage of the application only requires 4 CPU cores and 16GB of memory. Recently, Webb Brown, CEO of Kubecost, a Kubernetes cost monitoring firm, pointed out that many teams typically begin with a low 20% cost efficiency.
Teams often waste this extra capacity and don’t factor it into their overall cost calculations. Software company Flexera recently reported that teams typically exceed their cloud budgets by an average of 13% and waste approximately 32% of their cloud expenditure due to issues like overprovisioning that leaves cloud resources unused.
Kubernetes engineers are unlikely to employ usage reports to identify if they can cut their spending without sacrificing a comfortable buffer. Consequently, they must decide between wasting money or introducing risk, leading to overprovisioning as the default option.
Attempting to gain control over this spending can be incredibly frustrating. As one Chief Technology Officer acknowledged, “I spent the last two years developing with Kubernetes and containers. In one moment, I’d rave about how it handled mundane infrastructure management tasks. In the next, it frustrated me to comical degrees because it was almost impossible to control my own environment.”
This lack of control is primarily due to Kubernetes' proactive creation and disposal of container instances to meet demand, resulting in unpredictable resource usage. This volatility will challenge anyone seeking to track usage levels and allocate overhead expenses.
Moreover, in multi-cloud environments, users may receive separate bills from different service providers for their clusters, making it hard to even identify specific containers.
For example, suppose you have a Kubernetes cluster running on AWS and another on Azure. With such an arrangement, you will receive separate bills from AWS and Azure for the resources you use in each cluster. Tracking your overall spending becomes challenging, as well as identifying which containers use the most resources. Additionally, each cloud provider uses its own naming convention for containers, complicating identification.
Autoscaling is a prominent feature of cloud native architectures, and it often involves using on-demand containers ("if it even exists, you pay for it") to meet workload demands. To make the most of your resources, it's essential to embrace rightsizing in Kubernetes, as it allows you to optimize costs effectively.
Rightsizing is all about finding the perfect fit for container sizes, aligning them with the pods' ideal CPU and memory needs. It’s all about making the most of your resources without sacrificing performance. Adopting this approach can significantly cut expenses since you won't be paying for unused resources, making your Kubernetes deployment much more cost-efficient. However, achieving this goal may pose some challenges.
When demand spikes occur, scalability limitations impact the application's performance if resources are scaled up too late (e.g., Out of Memory errors, CPU throttling, or increased latency). Failure to rightsize leads to inefficiencies and bottlenecks. On the other hand, if a user overprovisions resources, autoscaling may increase the cost of running the application, potentially outweighing any performance benefits. Thus, it's not wise to leave resource provisions up to chance.
Monitoring creates the right balance of resources
Instead, organizations need to closely monitor their applications to achieve the right balance in resource allocation.
Given this need, Kubernetes offers an array of scaling tools to facilitate resource requests:
- A Vertical Pod Autoscaler (VPA) that automatically allocates more or less CPU and memory to existing pods. VPAs analyze historical usage patterns over time, identifying containers that are over-requesting resources and reducing their requests and limits. It also scales up requests and limits for under-requesting workloads, ensuring that they have adequate resources to meet their performance requirements.
- A Horizontal Pod Autoscaler (HPA) to dynamically manage pod scaling, adjusting the number of pods based on observed CPU and memory utilization. Thus, HPA adds more pods when the demand load goes up and removes pods when the demand load goes down.
- A Cluster Autoscaler (CA) to dynamically adjust cluster sizes in K8s. CAs automatically add or remove worker nodes based on specific conditions, such as insufficient resources for running pods or extended periods of underutilized nodes. It's important to note that the CA's functionality relies on integration with a cloud provider.
Kubernetes users should know the drawbacks and difficulties of its three autoscaling techniques. Understanding them can prevent inadvertent overscaling of applications. For instance, when calculating scalability, neither HPA nor its VPA factor in network, storage, or input/output operations per second. As a result, applications risk slowing down or even grinding to a halt. In addition, VPAs do not allow users to update resource limits on active pods, necessitating their removal and the creation of new pods when new resource limits are needed.
Kubernetes Cluster Autoscaler (CA) presents its own challenges. When making scaling decisions, it only considers a pod's resource requests rather than its actual usage. Consequently, the CA will fail to detect any unused resources the user may have requested, creating inefficient and wasteful clusters.
Also, the infrastructure they run on also needs to be scaled, making it necessary to use multiple mechanisms. Merely relying on the HPA or CA alone may not suffice. To effectively manage scalability, all three mechanisms may need to be used in tandem. Neglecting these tools can lead to heightened frustration and significant costs.
Secrets Management Issues
In Kubernetes, Secrets are not really secret. Although Kubernetes employs the Secret object to store your sensitive data, the data is represented in base64-encoded strings. Consequently, anyone with access to the cluster or the code can decode the K8s Secret.
By default, Kubernetes stores these unencrypted Secrets in the API server's underlying data store (etcd). Since anyone with API access can retrieve a Secret, anyone with access to etcd can do so as well. In addition, anyone who has permission to create a Pod in a Kubernetes namespace can use that same access to read any Secret stored there, even if they only have indirect access.
Thus, Kubernetes users should either employ a third-party tool to remedy this situation or thoroughly reconsider the storage of any sensitive data within the platform.
After all, hacking attacks are hardly rare occurrences. Potential attackers constantly search online for exposed Kubernetes components protected by lax access controls, such as access to API servers. In 2022, a report by Shadow Server revealed that over 380,000 K8s API server instances are exposed over the internet every day. In addition, configuring cluster access and authorization roles is highly complex in K8s.
Kubernetes users should be aware that adding components to the Kubernetes environment increases the overall attack surface, including the exposure of secrets. One such component, the Kubernetes Dashboard, presents a web-based interface for managing and visualizing the cluster. Improper configuration or security vulnerabilities in the Dashboard can introduce risks, particularly when it's accessible on the public internet without robust authentication and authorization measures. Unauthorized access to the Dashboard grants attackers the ability to view sensitive information, manipulate resources, and potentially gain access to stored secrets within the cluster.
Introducing new components may also require additional configuration steps and integration with secrets management systems.
Alas, Kubernetes’ security risks appear to be a growing concern for businesses. Earlier this year RedHat surveyed 600 DevOps and security professionals about the state of K8s security. They found that 67% of respondents have experienced delays or slowdowns in application deployment due to security concerns. In addition, just over a third of respondents reported experiencing revenue loss or customer attrition due to a container and Kubernetes security incident.
Steep Learning Curve
Navigating Kubernetes can be daunting. Taking a few hours to learn Kubernetes won’t get anyone over its steep learning curve. It's a complex tool solving a complex problem. In the aforementioned survey by Civo, 57% of respondents reported Kubernetes' steep learning curve as their top challenge.
What makes working with Kubernetes so difficult? Users must familiarize themselves with the software's three essential components: the containers, the kubectl command-line tool, and the Kubernetes manifests.
Among these three components, mastering the art of writing YAML files is the most challenging. While the YAML format is relatively straightforward, the key lies in meticulously structuring and configuring each YAML file to guarantee flawless functionality within the Kubernetes environment. Attentiveness to details such as indentation, syntax, and accurate placement of key-value pairs becomes imperative for seamless deployment. Additionally, YAML files must be thoroughly validated and error-free to prevent deployment issues and unexpected behavior.
Understanding the underlying logic behind K8s objects is even more demanding. Users must grasp the concepts of pods, services, and deployments, comprehending their functionality and interconnectedness within the Kubernetes ecosystem. They should also comprehend how these objects interact and collaborate with each other.
Kubernetes engineers typically undergo specialized training to work effectively with the platform, since setting up a K8s cluster demands a deep understanding of its architecture and components. This process involves provisioning the infrastructure, including installing and configuring crucial components like the API server, etcd, and kubelet agent. It also involves setting up networking and cluster networking solutions and deploying and configuring storage options. Finally, Kubernetes engineers must be able to add and configure additional tools like monitoring systems and logging solutions.
While Kubernetes offers advanced container orchestration capabilities, the complexity involved in mastering the platform can be daunting and time-consuming. Due to Kubernetes' steep learning curve, enterprises must carefully consider the trade-offs involved in adopting the software. Since the benefits of using K8s may not outweigh the investment required in training and resources, it is essential to consider alternative solutions.
Cloud Platform: A Viable Alternative
To recap, here’s a list of the issues we’ve discussed in this article. This list is by no means comprehensive as there are many other challenges to working with Kubernetes, but it’s enough to strike fear into even seasoned infrastructure engineers:
- The difficulty of configuring components
- Multiple layers of interdependent complexity
- Unpredictable costs and waste of resources
- Scalability limitations and rightsizing issues
- Vulnerability of secrets and mitigation strategies
- Steep learning curve and difficult documentation
Despite Kubernetes' continued popularity, the container orchestration platform still provokes considerable frustration. Kubernetes users face numerous challenges, including complex configuration demands, costly or unpredictable expenses, scalability limitations, secrets management concerns, and a steep learning curve. Nevertheless, many enterprises persevere in adopting K8s due to its unmatched ability to orchestrate containerized applications.
Cloud platforms simplify deployment by abstracting the underlying infrastructure, allowing users to focus on application development and deployment rather than managing Kubernetes clusters. Furthermore, cloud platforms use Kubernetes' scaling capabilities, offering simple scaling options and auto-scaling features to improve performance and resource utilization.
There is a growing demand in the market for an alternative to Kubernetes that supports globally-scalable application delivery. Cloud platform solutions like Aptible are becoming increasingly popular. Aptible allows developers to quickly scale their applications without worrying about infrastructure requirements. As time progresses, we expect more businesses to show interest in cloud platform products as they recognize their many benefits.
Questions You’re Asking in Interviews
Looking for your next engineering role? In 2013, a blog post by Edward Ocampo-Gooding (@ Shopify) inspired a list of suggested questions for interview candidates looking for their next web dev employer. Shortly after, this list inspired a popular blog post, Questions I'm asking in interviews by Julia Evans. Questions I'm asking in interviews open sources an extensive list of questions covering topics such as engineering practices, management's style and company culture, drawn from many others in the tech space sharing questions that are important to them throughout their interview process.
Backup Strategies on Aptible: Balancing Continuity and Costs
In the world of data security, especially for companies navigating the complexities of compliance, database backups are not just a best practice — they're often a regulatory necessity. That’s why, at Aptible, we've made this an integral part of our service, offering comprehensive database backup solutions as a standard feature.
Aptible in 2023
In 2023, we reintroduced ourselves as an entirely new Aptible. This included a new look, new pricing plans, and a new UI. But more importantly, we reintroduced ourselves as a platform on a mission to give enterprise-grade infrastructure to engineering teams of all sizes. This year marked significant strides in enhancing our platform, making Aptible more powerful than ever.