The Challenge: Managing Software Stacks at Scale
As a system administrator, you know that deploying and maintaining Linux distributions can be painful.
Even with the advent of Infrastructure-as-Code (IaC) paradigms, like Terraform, Linux systems often end up in different states (“snowflakes”) due to incremental updates.
The Container OS: Built for Containers and Kubernetes
Today we need our Linux distributions to be optimized for containerized, cloud native workloads using Kubernetes. This means keeping the base operating system stable and predictable so the cluster on top can operate with the same reliability, performance and security, and also optimizing the kernel with the dependencies and services that K8s needs.
You can avoid this effort if you instead adopt a container OS** **that’s optimized for Kubernetes. Container OSes are often deployed when resources are limited, especially in edge computing environments. This is why they’re often lightweight and paired with lightweight (under 100 megabyte) Kubernetes distributions like K3S.
Let’s Compare Six Major Container OSes
In this article we’ll help you find the best operating system for Kubernetes. We’ll cover:
- CoreOS, the pioneer cloud native OS
- Flatcar Container Linux, the successor
- K3OS, the lightweight
- Bottlerocket, the Amazonian
- Talos, the CNCF-certified installer
- Kairos, the factory
CoreOS, the Pioneer Cloud Native OS
Arguably the first container OS was CoreOS. The CoreOS team made its first release in 2013, even before Kubernetes was created. CoreOS Linux was built with many security features, such as automatic updates and a read-only file system. This type of operating system is considered “immutable.” CoreOS also included a vulnerability scanner and a container firewall.
In 2018, Red Hat acquired CoreOS, leading the Kinvolk team to create Flatcar Container Linux as a drop-in open source replacement.
Flatcar Container Linux, the Successor
Like CoreOS Linux, Flatcar is immutable. It’s configurable during the initial boot process, and any further modifications are not persistent. Only user data in specific directories persists across reboots.
As well as being immutable and easy to use, Flatcar comes with exciting features such as automatic system updates and active/passive partitioning capabilities that make scalability easy.
It is worth mentioning that ISO images are downloadable when deploying to bare metal servers, making Flatcar a good fit for Kubernetes edge use cases.
However, it is missing out-of-the-box automation to build Kubernetes clusters and does not provide any Kubernetesnative framework to manage the cluster life cycle. In our view it falls into the DIY bucket, wherein the management of the container OS is included, but the Kubernetes layer is completely disconnected.
For anyone looking to manage a serious Kubernetes deployment, we need to keep looking.
K3OS, the Lightweight
K3OS is a lightweight immutable OS designed by Rancher Labs specifically for K3S clusters. It contains only the fundamental components to power Kubernetes clusters. The light weight of k3OS and K3S brings many benefits such as reduced attack surfaces, shorter boot times and a more streamlined filesystem, which has made it a candidate for edge use cases.
Kubernetes Native
In addition, you can easily upgrade it using the Kubernetes API once the cluster is running. Indeed, the integration of k3OS with the Rancher system upgrade controller provides a Kubernetes native solution to upgrade the nodes by extending the Kubernetes API. It enables the k3OS nodes to automatically upgrade from the latest GitHub release by leveraging its own Kubernetes cluster capabilities. This makes it a true Kubernetes native process.
However, customizing the k3OS image and automating its configuration along with the Kubernetes cluster deployment is complex. Also, there is no Kubernetes native way to generate these images or easily deploy them to public clouds.
A Dead Project?
While these features may have been some interesting next steps for the project, the latest k3OS release was published in October 2021, and no GitHub issues have been addressed. With k3OS’s development apparently dead, users are now looking at other alternatives (example of other comments here) and it’s hard to recommend using k3OS in any production environment today.
What’s next?
Bottlerocket, the Amazonian
Bottlerocket is another open source Linux-based operating system specifically designed for securely running containerized workloads on Kubernetes clusters.
Great for AWS Loyalists
Amazon Web Services (AWS) created Bottlerocket in 2020 and it has integrated with a variety of AWS services, such as Elastic Kubernetes Service (EKS), Elastic Container Service (ECS), Fargate, EC2 instances, Graviton2 and Amazon Inspector. So while you can run Bottlerocket in various environments, it is primarily aimed at AWS public cloud.
While it’s easy to deploy in the AWS cloud or in VMware vSphere, provisioning Bottlerocket on bare-metal servers or in edge environments is a lot more difficult. (You can find the full guide here). It’s also an inconvenience that in VMware vSphere, Bottlerocket can only run as a worker node. This means that an existing control-plane node must already be in place, which you must configure separately.
The system is solely configurable via API, with secure out-of-band access methods. The Bottlerocket guest can only be accessed via the admin or control containers, which are extra components that must be installed within a separate containerd instance. There’s no SSH server or even a shell.
Kube Native
The Bottlerocket Kubernetes operator takes care of system updates, while images are secured by TUF (The Update Framework ). This is a completely Kubernetes native workflow that follows the same principles as the Rancher system upgrade controller. The existing image is replaced by the new one, and it has rollback capabilities should a failure occur during the boot process.
In addition, Bottlerocket supports multiple “variants,” corresponding to a set of supported integrations and features, such as Kubernetes, ECS, Nvidia GPU and many more.
It’s also possible to build your own Bottlerocket images from curated variants rather than directly downloading the artifacts. This requires Rust and the Docker BuildKit. Finally, it is worth noting that there’s no variant that includes K3S at the time of writing.
Talos, the CNCF-Certified Installer
Talos is a minimalist Linux distribution designed from the ground up to run Kubernetes.
Its main purpose is to bring Kubernetes principles to the operating system layer. It introduces a declarative way of managing both the OS and the Kubernetes native components live, allowing for a streamlined and efficient way of dealing with operations and navigating through the life cycle of the entire system. It was released in 2018 (pre-release) by Sidero Labs and is entirely open source.
The CAPI-Friendly Container Os
Talos completely removes SSH and console access in favor of API management. You can deploy Talos in any hyperscaler clouds, on bare-metal servers and virtualized systems. The tool also provides an easy way to deploy a local Talos cluster using the Docker runtime by executing the command talosctl cluster create.
It also includes a Cluster API (CAPI) provider, the Cluster API Bootstrap Provider Talos or CABPT. Its role is to generate bootstrap configurations for machines and reconcile the updated resources with CAPI. There is a separate provider for the control plane configuration, the Cluster API Control Plane Provider Talos (CACPPT).
Powerful CLI for Life Cycle Management
The talostctl command-line interface allows you to interact with the Talos nodes and the Kubernetes cluster without requiring any terminal or SSHconnection. It leverages the API along with Kubernetes CRDs. This enables frictionless life cycle management for all the Kubernetes infrastructure components.
Talos offers you an array of operations through the talosctl CLI in conjunction with declarative inputs. For example, you can upgrade your entire cluster in an orchestrated fashion by running talosctl upgrade-k8s --to 1.26.1, where 1.26.1 is the updated Kubernetes version.
Great Security Capabilities
Talos is very efficient at building secured Kubernetes clusters in a jiffy. Talos supports disk encryption, NVIDIA GPU and Fabric Manager, and allows you to manage the life cycle of your public key infrastructure (PKI). Disk encryption is useful when running Talos at the edge. It protects the data in case of a lost or stolen disk. However, it is not designed to protect against attacks where physical access to the machine, including the drive, is available.
It also provides built-in management features to facilitate cluster life cycle management.
For example, it deploys highly available Kubernetes clusters without any external load balancer by relying on a floating virtual IP and allows for secure connection via Wireguard peer discovery on the public internet. In case of a node failure, one of the remaining control-plane nodes takes ownership of the VIP.
Also, the etcd cluster is automatically bootstrapped during cluster initialization, and scaling the control plane up and down is easy.
Lightweight, but Opinionated
The system footprint is very small, with an 80MB SquashFS image size. This drastically reduces the attack surface of the cluster. However, the opinionated approach of Talos also means that it has some drawbacks and limitations:
- It doesn’t support K3S, although the reduced OS size compensates for the total footprint difference.
- Image customization is limited to kernel modules and root filesystem content.
- As the kernel footprint is reduced, so is the list of supported hardware and specific kernel functions.
- Some aspects of the management of the system are more complex than traditional Kubernetes environments.
As a result, Talos is well-suited for specific scenarios where the trade-off between flexibility and secure off-the-shelf distribution is acceptable.
Kairos, the Factory
Over the last couple of years, an interesting pattern has emerged in Kubernetes. It entails using Kubernetes as an extensible API to add automation capabilities.
For example, Cluster API allows for the deployment of Kubernetes clusters by making cluster logical components first-class citizens in Kubernetes. So, from an existing Kubernetes cluster, you can bootstrap a new Kubernetes cluster and delegate its management once it has been deployed.
Kairos operates on the same principles. It allows you to build and customize immutable operating system images and Kubernetes clusters by extending the Kubernetes API. It delivers these capabilities via a custom controller watching for Kairos custom resources. The controller takes appropriate actions based on the CRUD operations performed on these resources.
A Factory for Building Your Choice of OS
Kairos offers more than just a container-specialized OS. Kairos acts as a Kubernetes “factory” producing K3S clusters underpinned by the immutable OS of your choice. As opposed to the other solutions described previously, Kairos is a meta-distribution, meaning that it has the ability to transform any existing Linux distribution into an immutable operating system.
Rather than being opinionated, it gives the flexibility to use your operating system of choice. Once you have chosen the distribution, Kairos releases an immutable artifact that you can deploy as a complete operating system underpinning Kubernetes clusters.
OCI Registries Simplify ISO Image Build
The only requirement is an OCI-compliant container image of that system. Kairos relies on Open Container Initiative (OCI) container registries to build the full-fledged OS from a container image. This simplifies the OS build and update processes, as it is achieved by using Dockerfiles and a container runtime. Alternatively, Kairos also delivers pre-built images for every release.
Kairos delivers the resulting artifact via an ISO image that is crafted via multiple options: Kubernetes custom resource definitions (CRDs), Preboot Execution Environment (PXE) boot or manually by mounting the ISO image on the target server.
Another key feature of Kairos is AuroraBoot. It allows you to bootstrap Kairos images directly from the network by teaming up with your Dynamic Host Configuration Protocol (DHCP) server. Currently, Aurora runs as a Docker container, but will shortly be available as a Kubernetes pod. With Aurora, all you need is a configuration file specifying the container image you want to deploy as your Kubernetes cluster OS, along with the cloud-init configuration.
Self-Coordinated Cluster Bootstrapping
Kairos natively supports Kubernetes. More specifically, it delivers K3S clusters, making it a perfect choice for Kubernetes edge use cases.
In that context, Kairos also has the ability to self-coordinate the bootstrap of Kubernetes clusters without the need for any central server. It means that it can deploy a high availability (HA) Kubernetes cluster on demand with no other setting required than the desired number of control-plane nodes and the virtual IP used by kube-vip. Combine this approach with Aurora, and you can automatically deploy an HA cluster via the network in a flash. The election process defines the role of every node and is completely distributed via a shared ledger.
As a consequence, whether you are looking to deploy a single-node cluster, or a large HA cluster with multiple control-plane nodes, the process is identical. Consequently, it drastically reduces cluster build, while also allowing for better scalability.
High Security
In terms of security, Kairos optionally provides disk encryption using the local Trusted Platform Module (TPM) chip. Multiple scenarios are supported:
- Encryption keys can be stored within the TPM chip
- After encryption with the TPM key pair, an external server can be used to store encrypted passphrases for user-data partitions
- A key management (KMS) server can be used to store passphrases and return them to the nodes after a TPM challenge.
Finally, Kairos streamlines feature configuration using a cloud-config format, compatible with cloud-init. It provides templatization capabilities for dynamic configuration, simplifying automation and integration with CI/CD pipelines.
Kairos was initially created for Kubernetes edge operations, but it's also a great alternative to run Kubernetes clusters on bare metal or virtual servers in the data center.
Its versatility and selection of base Linux distributions make Kairos an ideal solution for enterprise customers who are bound by certain vendors and operating systems, but still want to take advantage of a container-specialized OS, immutability and automation at scale.
Conclusion
Why You Need A Container OS
Kubernetes is a complex distributed system, and you cannot afford to build your clusters on top of a poor foundation. Although Kubernetes is often considered the “cloud operating system,” it is not an operating system per se. It needs an OS that delivers strong and stable support through immutability and specialization.
Immutability enforces that declarative-driven state that empowers all modern infrastructure tools, eliminating snowflakes and leading to better performance predictability and higher scalability.
For edge use cases, most operations are performed remotely, with little or no qualified staff locally present. So, features like atomic updates, easy rollback, limited writable filesystem and extra security are key. All are made possible by adopting immutable container OSes.
Does Choice of OS Matter to You?
Among the solutions we compared in this article, only Kairos allows you to turn any Linux operating system into an immutable OS. This may be the preferred option if you want to keep using your favorite distribution.
Alternatively, you can choose from a few curated operating systems that provide immutability out of the box. Most of the solutions we’ve described are opinionated, with their benefits and drawbacks.
Don’t Forget the Management Plane for Multicluster at Scale
Container-specialized immutable operating systems are only one piece of the puzzle. As you deploy multiple clusters across different locations, especially at the edge, you also need a central management plane to help with standardization, ease of deployment and operations.
Palette Edge from Spectro Cloud is built on top of Kairos and adds central management capabilities. It provides an extra layer of abstraction with Cluster Profiles, which allows you to create standardized Kubernetes cluster configurations and deploy anywhere when associating the desired edge machines.
But don’t take my word for it! You can try Palette Edge for free and compare it to the other solutions mentioned in this article or check out the docs.