Spanning clusters in hybrid environments: a pipe dream?
Many organizations have Kubernetes use cases where a hybrid cloud architecture would be desirable. A hybrid architecture combines public cloud and on-prem infrastructure in a single cluster that spans multiple environments. Instead of running multiple independent clusters, you have a single control plane running in one environment, managing nodes in others.
What can this unlock?
Improved resource utilization for specialized on-prem resources
Applications that need low-latency processing close to their data sources or those that require specific hardware configurations only available on-premises (think: AI/ML) can now be seamlessly orchestrated by the same control plane that manages cloud native workloads.
Why would you want to dedicate valuable on-premises hardware to cluster management and “plumbing” when you can offload that work to the cloud and dedicate every drop of resource capacity in your datacenters to running application workloads?
Cloud bursting for spiky workloads
Hybrid architectures offer an efficient pathway to cloud bursting — where on-premises applications temporarily scale to the cloud to handle spikes in demand. For applications that experience fluctuating workloads or peak traffic at specific times, cloud bursting allows hybrid clusters to scale dynamically without the need to over-provision on-premises resources.
This elasticity provides organizations with a more cost-effective and flexible approach to managing high-demand workloads, ensuring that resources are available when needed without maintaining excessive capacity on-premises.
Site to site disaster recovery
With a hybrid architecture, multiple edge sites can be registered with a single control plane. If one of those sites goes down, the Kubernetes scheduler can seamlessly reschedule workloads to another site, providing a disaster recovery (DR) capability. The hybrid nature of the cluster with a unified control plane enables advanced mobility for workloads that's inherently unavailable in standard kubernetes clusters, whose nodes are all colocated within a single physical site.
Reduced overhead and complexity
Managing Kubernetes across different environments traditionally requires separate clusters and toolsets for on-premises, edge, and cloud deployments. This fragmentation can result in siloed infrastructure, inconsistent policies, and increased operational overhead.
With a hybrid, spanned architecture, you no longer need to maintain a control plane per edge location. Instead you would manage the on-prem nodes through the same toolsets and processes you use to manage cloud-based resources.
Consistent security and observability
Hybrid architectures enable a consistent security model across both on-premises and cloud environments, reducing the risk of security gaps, particularly in regulated industries that require strict access controls and compliance with data residency requirements. Integrated observability simplifies monitoring and logging across hybrid environments, enabling teams to track application performance and detect issues consistently across on-premises and cloud deployments. With centralized monitoring, organizations gain comprehensive visibility into their hybrid clusters, allowing for faster troubleshooting and reduced downtime.
So why isn’t everyone doing it?
Hybrid deployment models are particularly valuable for edge computing and telecommunications, where local processing is crucial, yet centralized management and scalability are mandated by scale. Other use cases include financial services, machine learning, and media streaming. Workloads that are latency-sensitive, require specialized hardware, or are bound by strict data governance regulations can be brought under a single operational, security, and tooling umbrella alongside (and connected to) cloud-native applications.
There’s only one downside. Until now, stretching Kubernetes clusters between public cloud and private networks, including data centers and edge locations, has been a prohibitively complex endeavor.
EKS Hybrid Nodes makes the dream real
AWS’s latest innovation, EKS Hybrid Nodes, enables customers to leverage on-premises and edge infrastructure as worker nodes in EKS clusters.
Hybrid nodes are bare metal and/or virtualized hosts running outside of AWS that register with a centralized EKS control plane as worker nodes, resulting in a single EKS cluster that spans the Amazon cloud and one or more private locations.
Hybrid nodes may be split across multiple sites, each utilizing a distinct private network. Multiple nodes within the same hybrid site can be considered a single hybrid node pool.
AWS hosts and manages the Kubernetes control plane, which reduces the operational effort required to run hybrid Kubernetes applications and ensures that valuable on-premises hardware is reserved for application workloads.
An EKS Hybrid cluster, connected via a site-to-site VPN architecture using AWS Transit Gateway
Ingredients for your first hybrid cluster
To fully understand hybrid clusters, their configuration, and capabilities, we have to cover a few concepts that are a little different to your conventional EKS cluster — particularly when it comes to the networking requirements.
Setting up the EKS cluster
Users can create clusters with hybrid node support using the AWS CLI, CloudFormation templates, or eksctl.
Two additional pieces of metadata — remote node networks and remote pod networks — are required when deploying a hybrid EKS cluster. Both are defined as comma-separated lists of CIDRs for hybrid nodes and their pods, respectively.
Therefore, knowledge of the network topology of each hybrid site is required prior to deployment.
Authentication
Hybrid nodes use either AWS IAM Roles Anywhere or AWS Systems Manager (SSM) to securely authenticate with the EKS control plane.
In either case, an IAM role that has been pre-registered with the EKS cluster is assumed and utilized by the kubelet running on the hybrid node. Registration of the IAM role is performed as per usual with EKS: by editing the aws-auth ConfigMap.
Networking and connectivity
As might be expected with hybrid nodes, much of the complexity lies in the networking… it’s always the network :)
L3 connectivity is required between on-premises infrastructure and the AWS VPC leveraged by the EKS cluster. Solutions such as AWS site-to-site VPN and AWS Direct Connect enable this.
EKS's VPC CNI plugin is unsupported on hybrid nodes, but can still handle networking for cloud-based worker nodes. Alternative CNI plugins such as Calico or Cilium are recommended for hybrid nodes, as they provide IP address management (IPAM) and optional BGP configuration for nodes outside of AWS.
Irrespective of your chosen CNI, the CNI’s DaemonSet must be updated with affinity rules targeting only hybrid nodes. A label, eks.amazonaws.com/compute-type: hybrid, is automatically applied to each hybrid node for that purpose.
Hybrid Nodes use IPv4 addressing and require routes between the EKS cluster's VPC and on-premises networks. For solutions using a VPN architecture, it is recommended to have a dedicated VPN server per hybrid node pool.
It is the user’s responsibility to ensure that a route exists in the route table of the EKS cluster’s VPC that maps traffic destined to each hybrid pool’s node and pod CIDRs to that pool’s VPN server (or the correct private subnet CIDR if using Direct Connect).
Each VPN connection in AWS should have two static routes associated with its hybrid node pool:
- Node CIDR of the hybrid nodes in the pool
- Pod CIDR of the pods running on the hybrid nodes
If route propagation is enabled for the Virtual Private Gateway and/or Transit Gateway, these routes will be added to the VPC’s route table and will satisfy the route creation mentioned above.
On-premises VPN server configuration
Each hybrid node pool should ideally have a dedicated VPN server. These servers will terminate the on-premises side of a site-to-site VPN tunnel between the AWS cloud and the private network operating at each hybrid site. Each VPN server will have two ipsec P1 tunnels, each with two P2s configured:
- Hybrid node subnet → EKS VPC CIDR
- Hybrid node pod CIDR → EKS VPC CIDR
The EKS API server must be able to route directly to any IP in the remote pod CIDRs specified during installation. Two concrete examples of when this network path is exercised are: 1) pod log requests initiated via kubectl and 2) webhooks running on hybrid nodes.
Based on the routes in the EKS VPC’s route table, traffic will transit through the appropriate hybrid node pool’s VPN tunnel to the private network. Therefore any on-premises VPN servers must be configured to route requests to specific nodes depending on which slice of the overall CIDR the packet’s destination IP falls under. This can be automated by configuring Cilium to operate in BGP mode and creating a BGPPeeringPolicy. Alternatively, static routes for each node’s specific pod CIDR can be manually configured on the VPN server.
If the VPN server is not the primary router for the network, it must either be configured to broadcast routes via BGP to the rest of the network, or the primary router must have static routes configured to redirect traffic destined for the EKS VPC CIDR to the VPN server. Static routes for the hybrid nodes' pod CIDRs may also need to be configured on the primary router if BGP is not configured on the VPN server.
Setting up a hybrid node and joining it to the EKS cluster
AWS provides a CLI tool, nodeadm, to enable hybrid node lifecycle operations such as installation, configuration, registration, and upgrades.
Kubernetes versions 1.26 to 1.30 are supported, with node operating systems limited to Amazon Linux 2023, Ubuntu (20.04, 22.04, and 24.04), and RHEL (8 and 9).
The provisioning flow looks something like this:
Bootstrapping a hybrid node using nodeadm
Once nodeadm init has completed successfully, a new worker node will appear in your EKS cluster and become ready as soon as a CNI agent lands on it and works its magic. Easy, right?
When the rubber hits the road
So you’ve got your first hybrid cluster set up, navigated the networking requirements and joined your first hybrid node to it successfully — congratulations!
Naturally at this point, you’ll be thinking about how this scales into production. There are two big areas you need to think about:
Bootstrapping infrastructure for registration: EKS Hybrid Nodes follows a “bring your own infrastructure” approach, where you are responsible for provisioning and managing the bare metal servers or virtual machines intended for use as hybrid nodes. You need to bake nodeadm into your OS images, somehow procure node-specific configuration files, and automate nodeadm’s execution. How will you repeat the registration process consistently and securely?
Orchestrating the end-to-end lifecycle management of hybrid nodes: Everything after day 1 is also left up to you. How will you orchestrate rolling, immutable upgrades for your hybrid nodes, or manage rotating the PKI assets leveraged by IAM Roles Anywhere?
Answering these questions is where infrastructure partners such as Spectro Cloud come in. When we were invited to participate in the EKS Hybrid Nodes beta program, we jumped at the chance. We knew immediately that Palette could augment and extend the core Hybrid Nodes feature.
Our ability to repeatably automate the infrastructure lifecycle simplifies the process of bootstrapping hybrid nodes and orchestrating hybrid EKS clusters at scale, helping organizations get on the fast track to the value of hybrid architectures.
Palette and EKS Hybrid: step by step
With a few straightforward steps, you can import your hybrid EKS cluster to Palette and leverage our Edge Kubernetes capabilities to seamlessly manage the end-to-end lifecycle of its hybrid node pools.
The process is as follows:
- Import the hybrid EKS cluster to Palette
- Activate hybrid mode and provide authentication details for its hybrid nodes
- Provision one or more Edge Hosts
- Define a Cluster Profile for your hybrid nodes
- Create a hybrid node pool for your EKS cluster, leveraging your Edge Hosts
Step 1: import your EKS hybrid cluster to Palette
Importing an EKS hybrid cluster to Palette is a matter of applying a single manifest, obtained via API or the Palette user interface. Clusters can also be automatically imported using the Palette CLI.
Import your hybrid EKS cluster
Step 2: Activate hybrid cluster mode
Once imported, navigate to Hybrid Configuration within the cluster’s Settings. There you’ll activate hybrid mode and provide credentials for your chosen authentication provider.
Enable hybrid mode and provide authentication details
Step 3: Provision your on-prem hosts
Now you’re ready to provision some edge hosts using on-premises infrastructure. Edge hosts are simply machines that have been flashed with an installer ISO created using Spectro Cloud’s EdgeForge workflow. This includes anything from beefy data center servers to small form factor edge devices. The installer ISO can be customized to accommodate various site-specific network and hardware configurations and reused for every hybrid node at that site.
Once the machines are booted from the installer ISO and the installation completes, they will power off. Following that, you simply remove the installation medium and reboot. At that point, Spectro Cloud’s edge agent will register the device with Palette and you’ll be ready to add it to a hybrid node pool.
Multiple Edge Hosts have been provisioned
Step 4: Define a Cluster Profile for your hybrid nodes
Before you can create a hybrid node pool, you must define a Cluster Profile to encode the desired state of the hybrid node. Palette Cluster Profiles serve as reusable blueprints for declaratively configuring a Kubernetes cluster: from the nodes’ OS, to the Kubernetes distribution, all the way to addons; as opposed to manually configuring each node, one by one.
A hybrid Cluster Profile consists of three layers: OS, Kubernetes, and Network.
- The OS layer includes a reference to the provider image built during the EdgeForge workflow and any optional advanced customizations designated using cloud-init syntax.
- The Kubernetes layer is used to select the version of EKS-D kubelet to be provisioned on the node via nodeadm.
- The Network layer is merely a placeholder in the case of hybrid nodes, since the CNI deployed on the EKS cluster itself will handle network operations for each hybrid node.
Step 5: Create a node pool
Now you’re ready to add a new hybrid node pool to your EKS cluster. It’s as simple as choosing a name, selecting a Cluster Profile, and picking one or more Edge Hosts. A custom VPN Server IP may optionally be assigned to each Edge Host.
By default, we assume that the network will understand how to route requests destined for the EKS cluster’s VPC CIDR via the VPN server. If that isn’t the case, one may manually provide the VPN server’s IP address for each host and a static route will be configured on the host by Spectro Cloud’s edge agent during cluster creation.
Add your Edge Hosts to a new hybrid node pool within the EKS cluster
A single pane of glass for hybrid node operations
Once you’ve created one or more hybrid node pools, various day 2 operations are supported out of the box, such as node addition and deletion, version upgrades, and even advanced OS-level configuration via customizations of the OS layer in a node pool’s Cluster Profile.
Each hybrid node pool’s health, status, and update availability is visible via a single pane of glass. A dedicated event stream for each pool is also provided for monitoring and debugging purposes.
View, modify, and upgrade hybrid node pools via a single pane of glass
By aggregating multiple hybrid nodes into a pool whose configuration is associated with a specific site, we unlock the ability to perform seamless fleet management at scale.
If the Cluster Profile utilized by a hybrid node pool is updated — say by bumping the Kubernetes version — the node pool will immediately become eligible for a rolling repave. Once you opt into triggering the repave, each node in the cluster will download the provider image containing the latest updates and reboot in a controlled manner, applying any OS-level customizations in the process.
And of course, if you have multiple EKS Hybrid clusters, other EKS clusters, or clusters in other clouds, on-prem and edge environments… Palette gives you a single place to manage them all at scale.
The final word — and your next steps
Here at Spectro we’re all about choice. Wherever you want to run your clusters, and whatever architecture fits your needs, we do our best to support it. With the launch of EKS Hybrid Nodes, AWS is unlocking a true hybrid cloud, stretched cluster model for Kubernetes, which is a whole new tool in your toolbox as a cloud platform team.
In this blog we dove into how Palette and EKS Hybrid Nodes are better together, with Palette streamlining the process of bootstrapping your on-prem devices and enabling you to manage the lifecycle of your nodes at enterprise scale. We’re excited to see how things develop!
So what’s next? If you haven’t already, check out the launch of Hybrid Nodes at re:Invent, and dig in to the EKS best practices guide to learn more. If you’re intrigued to see how Palette can help you manage your EKS clusters, hybrid or not, check out our AWS solution page or get in touch to book a demo and find out about our EKS Quickstart service. Thanks for reading!