There is no doubt that Kubernetes is the de facto container cluster management platform. If we look back, the reason why Kubernetes won the container orchestration war, over many other solutions, is NOT because it abstracted out compute (CRI), storage (CSI), network (CNI) and then built an ecosystem around it, but because it makes application lifecycle management for container based applications a lot easier for devops, via a declarative desired state based management approach.
What’s that mean? Developers can write a simple YAML file to describe application topologies in terms of pods, services, etc., and Kubernetes will use what is described in this YAML file to deploy the application and place the container services. “This is not a big deal”, you might say; “I can easily write some scripts to automate the container deployment, and many other solutions such as Docker Compose can do that too.” You’re right, but the eureka moment comes when you start to deal with day 2 operations. If a host has a hardware failure and all container services running on that host node have died, in seconds Kubernetes will detect the application’s current deployment status has deviated from the desired state and will automatically relaunch containers to match with the desired state again. In a conventional automation system, admins would have to set up many monitoring and alerting rules, as well as automation scripts, to achieve this kind of self-healing functionality. Then comes the upgrade cycle, where instead of writing scripts to handle updates, migrations, and orchestration if there are multiple steps and conditions involved, all you have to do is update the YAML with the latest version of application services and configurations, and voila!, with one command Kubernetes will reload the YAML and automatically handle all your container services updates. This really makes devops’ life much better, and you will finally have time to enjoy a cookie rather than becoming a late night script debugging warrior.
Kubernetes has done an excellent and elegant job in application lifecycle management. However, when it comes to Kubernetes’ own cluster lifecycle management, the cluster admin again has to deal with deployment, update, monitoring/alerting, self-healing, auto-scaling, etc. The first generation of tools such as Kops, Kubespray and Rancher all focused on automation, but with limited capabilities. Kubernetes is pretty complicated with many components involved in a multi-node distributed environment. Inevitably it requires sophisticated orchestration and monitoring to handle the required lifecycle management tasks. Now imagine if you need to deal with a fleet of 100+ clusters, some even in different cloud environments; none of the first generation Kubernetes cluster management tools can handle cluster management at scale without putting a lot more effort in on top of these tools.
So we at Spectro Cloud asked ourselves, “Since Kubernetes has done such a great job in application lifecycle management using the declarative desired state model approach, why not let Kubernetes manage other Kubernetes clusters and their infrastructure’s’ lifecycle in the same way it manages applications?” So there you have it — 2nd generation technology for Kubernetes cluster management! You describe the entire infrastructure stack for the Kubernetes clusters themselves (in something called a *cluster profile *in Spectro Cloud parlance) and apply the desired state model approach to it. Compared to first generation automation based Kubernetes cluster management, the desired state based management approach has several unique advantages:
- Templatized deployment: Because a cluster profile is a declarative model of a full-stack model of Kubernetes and its infrastructure, it can be treated as a template or blueprint to easily deploy to one or more cloud environments. This is significantly less work than the automation approach, as cloud nuances and the variety of integrations (e.g., load balancer, logging, monitoring, security, service mesh) are all complexities behind the scene. The cluster admin just needs to define the cluster profile, without having to write complex automation deployment scripts. This brings consistency across clusters.
- Run-time resiliency: The Kubernetes clusters’ status is constantly monitored and compared against the cluster profile model. If there is any deviation from the desired state definition, e.g., a master node is down, the system will automatically relaunch the node to match against the desired state. There are no additional monitoring, rules, or automation scripts needed to achieve self-healing, auto-correct capabilities.
- Easy upgrade/rollback: The cluster profile is the single source of truth. For upgrades, the cluster admin just needs to update the YAML file to describe the new cluster stack — whether it includes a new base OS, Kubernetes version, CSI/CNI integrations, or add/delete/update of any other add-on integrations, the system will automatically know there is an update for the clusters deployed from this cluster profile, and an admin can decide when to upgrade, either on-demand or through a policy to upgrade at a scheduled time.
- Gitops friendliness: Because cluster profiles and configurations are all YAML based, it is easy to support gitops based management; you can check cluster profiles into the git repository, and take advantage of the standard PR process and easily diff between revisions, view history, and perform rollbacks.
- Foundation for Multi-cluster workloads: When container workloads go to production, the ops admin will inevitably start to deal with multiple clusters, whether it is for DR/HA, or truly multi-site hyper-scale deployment. All these rely on the consistency of Kubernetes clusters, and a robust desired state management system can help with workload placement and lifecycle management across clusters as well.
It is pretty clear that the desired state based Kubernetes management approach is a big leap forward. Kubernetes’ open source Cluster-API project lays the foundation of enabling Kubernetes to manage Kubernetes clusters in a desired state fashion. However, although Cluster-API does an awesome job with Kubernetes cluster deployment and update, it is just a building block and not an end-to-end solution for desired state based Kubernetes management. Cluster-API only deals with core K8s integrations (OS and Kubernetes), but not related add-on integrations. There are many steps required before Cluster-API can kick in. For example, the user needs to prepare the base OS image with all required K8s components and also needs to pre-generate the certificate needed for deploying Kubernetes. There are many enterprise requirements such as IPAM support, underlay SDN support (don’t confuse this with CNI on overlay network), external load balancer support that are still missing. At Spectro Cloud, we are fully committed to Cluster-API, contributing enhancements back to the project, but also building an end-to-end enterprise grade desired state based Kubernetes management platform, in both SaaS and on-prem installable solutions.
Tired of writing automation scripts for everything? It is time to move to desired state based management, no complex scripting needed!!