Cloud repatriation: why on-prem Kubernetes isn't going away

What is ‘cloud repatriation’, and why are people talking about it?

Over the past 20 years, there’s been a headlong rush to migrate enterprise workloads from data centers to the public cloud hyperscalers, as CIOs pursued the dream of infinite elastic scalability, global reach, and cost efficiencies.

Cloud repatriation is a relatively new trend that is all about taking infrastructure and applications back from the cloud to run on-premises in a private cloud, principally to reduce costs.

CIOs have seen how easy it is to rack up huge cloud bills, and quarter after quarter the hyperscalers all report record profits. IT leaders are disillusioned with the costs of cloud infrastructure (especially the hidden ones).

Homelab community's view of cloud services vs floor desktops

Indeed, the IT community has some loud voices sharing compelling stories about how much money their companies saved by running their workloads back in the on-premises data center.

Others are looking for more control, security and compliance than the public cloud can offer, and need to make sure all the government requirements are satisfied.

The good news is that Kubernetes functions almost the same in the cloud and on-premises, which simplifies migration in both directions. But is cloud repatriation the right thing to do? Let’s unpack.

Cloud services vs on-premises

First of all, on-premises infrastructure has never actually gone anywhere. Your cloud compute resources are still ultimately operating from data centers; they’re just controlled by someone else.

What’s more, ‘going back on-prem’ doesn’t mean the same thing to everyone. Many alternatives to the ‘big three’ hyperscaler services exist:

Traditional owned and operated centralized data centers (whether you currently have them active, mothballed or would need to provision from scratch)
Owned and operated compute infrastructure in branch offices, regional sites or edge locations
Specialist or regional cloud vendors like Hetzner and DigitalOcean, plus bare-metal clouds
Alternative data center ownership and delivery options, including colocation, managed service providers and outsourcing

Depending on which of these services you pick, you’ll get a different economic model and a different level of effort on the part of your in-house teams, which can change the equation.

With all that said, there are some generalizations we can make, so let’s compare how on-prem differs from public cloud computing:

Comparison point	On-premises	Public cloud environments
Ease of use	Takes some initial effort	Effortless at the beginning
Cost	Some upfront costs, ongoing maintenance	PAYG, but costs can add up
Kubernetes availability	Anything you want	Often dictated by cloud provider
Latency	Lower, as usually it’s local: expansion into new regions requires new local DCs	Depends on cloud region: easy infrastructure expansion into new regions
Data sovereignty	Easy to achieve, per DC location: local DC satisfies criteria by definition	Depends on cloud region: if there’s any region available locally for your needs
Control and responsibility	Full	Shared with public cloud
Skills necessary to operate	OS-based skills required	Cloud-related skills required

Ease of use

Ease of use, and speed of use, has always been a key selling point of public cloud, from IaaS to SaaS.

Compared to traditional on-prem data centers, you don’t need to procure hardware, rack and stack, provision the OS, or set up cabling for the network infrastructure. You head to the cloud hyperscaler’s dashboard and click click click, and you have a configured environment in which to deploy your applications.

But this is not a totally fair comparison. There are many companies out there offering you a whole spectrum of on-prem options, from colocation in their managed facilities, through to on-demand managed boxes. With these services, you don’t have to worry about cooling, power supplies, disks and cabling (unless you want to), yet you keep all the control.

Cost

The main motivation driving cloud repatriation is cost. Hyperscaler fees add up, from the cost of computation and storage to ingress and egress fees. A lot of these costs can be managed and optimized, with care and effort about architecting and placing applications and choosing the right instance sizes and policies. Check out this blog and this one for some tips.

If your company is working with one of the major hyperscalers, you probably have some committed spend and a signed multi-year agreement binding you to consume a certain amount of services.

But if you can escape the contract terms, and if your usage patterns fit, there are potentially huge savings to be made by going on-prem. You’ve probably read about 37signals’ experience.

Of course, just as the motivation for jumping to cloud was to swap capex for opex, moving workloads back on-prem is likely to include upfront costs that have to be budgeted for — unless you go for a fully managed service or already have your own facilities. Potentially you’ll need to ramp up headcount, too.

Latency, performance and scalability

Application performance is of course a product of many different factors.

On the public cloud side, a global hyperscaler gives you the option to replicate your application to different regions for proximity to users, resulting in lower latency. Plus you have on-demand scaling to larger beefier compute instances if traffic load increases.

To replicate this in your own data centers would require significant capital investment — entering a new region, for example, would mean setting up a new facility. Doubling capacity would mean ordering new boxes. This may be trivial if you’re renting boxes on demand from a compute provider, but if you’re truly in house, it can introduce a lot of lead time.

But large and dynamic services like Bluesky moved away from the cloud and have scaled effectively with an on-prem architecture: the cloud is not the only place you can run a demanding and spiky workload! Bluesky of course has a degree of decentralization, which brings us to edge computing.

The philosophy of public cloud is around centralization: that’s how it enables scale and cost efficiencies. But many workloads today are super latency sensitive and are best run at the edge — the most extreme on-prem modality.

On-prem is also often the best performance choice for specialist workloads, such as gaming, HPC or AI/ML training. While the cloud hyperscalers offer GPU instances and bare metal instances to eliminate VM overhead, sweating GPU assets 24/7 in an owned bare metal data center offers better bang for the buck. This is one of the use cases for hybrid cloud, and in particular Amazon’s new EKS Hybrid Nodes feature.

Kubernetes choice

The cloud hyperscalers don’t just run the boxes for you: they offer a complete ecosystem of service components, from storage and databases to identity management, security, load balancers, and everything else. This is convenient, but it can also look and feel like lock-in, leaving you exposed to vendor price increases, feature deprecations, or product sunsets.

The cloud hyperscalers all have a degree of lock-in even when it comes to Kubernetes, particularly for their managed Kubernetes offerings, GKE, EKS and AKS. But when you adopt Kubernetes, you’ve already taken a big step towards a different trajectory.

Kubernetes is an open system based on open standards. If you work directly with it, especially using tools based on Cluster API, you’re free to use any workloads as you see fit. Let’s say spot instances are cheaper at Google. Or storage is cheaper at AWS. Or managed Kubernetes is cheaper at Microsoft. You’re free to benefit from all the promotions and interesting prices!

Data sovereignty

Despite the strong compliance profile of the hyperscalers, some industries simply don’t have the luxury of going into the nebulous cloud. For example, our customers in the pharma industry are obliged to keep their patients’ data out of the cloud, or within country borders. Other industries, like gambling, have to keep customer data within state boundaries.

We’re seeing that in various jurisdictions there are more and more legal requirements mandating data placement. Yes, there have been some attempts to achieve compliance with Safe Harbor and similar initiatives, but with the ever-changing nature of compliance requirements cloud deployments have proven to be challenging for many of our customers.

Control

If you’ve dealt with Amazon or its peers, you’re familiar with the Shared Responsibility model. In short: AWS secures the infrastructure, and customers secure their data and applications. What that means in real-world scenarios is that you still have to invest a considerable amount of time to secure applications in order to get a fully secure environment. Think of your applications and data and try to estimate chances of an intruder getting access through infrastructure bits or your application bits. Cloud providers do protect your infrastructure layer, but in any case you’re responsible for your applications. If they’re already penetration tested and verified, securing an infrastructure layer is the only thing you’d need to achieve.

“With great power comes great responsibility”

Cloud hyperscalers, and infra providers, operate to agreed SLAs. However, chances are that you’re running custom applications on your cloud infrastructure. Think of your infrastructure estate as a whole, including apps: when was the last time that an outage happened: was it related to the infrastructure or the application?

Consider also the stability of your applications: do you suppose them to be more robust than public cloud infrastructure? Depending on your answer, you may want to allocate time and resources accordingly to infrastructure or applications. If your applications are reasonably secure, it’ll probably take you some limited amount of time to protect your infrastructure. Yes, you’ll be assuming the whole responsibility, but there’s a fair chance you are accepting the responsibility with the cloud anyway, perhaps it’s a chance to get the whole power in your hands?

Skills

Maintaining your own on-prem infrastructure and software stack requires some skills. However, chances are you and your team already possess them. And let’s be honest: you’re probably running a huge application stack on top of your infrastructure: middleware, databases, some storage. All the technologies in the hardware stack have been in use for years and are fairly easy to grasp if compared with Cloud Native Stack.

Let’s imagine you’ve found a specialist who understands AWS well and they’re willing to stay with your company. Now let’s imagine your company acquired another company that uses Azure. Or another company that uses Google acquired your company. Now you’ve been tasked with managing heterogeneous infrastructure. Which means that you’ll have to find specialists with corresponding Azure or Google knowledge. And then manage this heterogeneous infrastructure.

Compare that challenge with just finding talent for managing on-premises infrastructure or delegating that to a service provider: you solve this task once and you’re safe whatever happens!

The real world is not black and white: it’s hybrid

As we’ve discussed, there are a lot of ‘ifs’ and ‘buts’ about cloud repatriation. The ‘right’ answer for one application might not be the same for others; and the ‘right’ answer today might be the wrong one tomorrow.

The public cloud has a place. Traditional data centers have a place. So does edge computing.

Kubernetes can be the bridge between those locations, making applications more portable.

We’re seeing an increase in hybrid architectures too. For example, one of the manufacturers we work with was doing a deployment of edge devices across the world and processing its data in the cloud. However, at some point it found that sending data to the cloud for analysis cost too much and decided to move processing to the edge. Which it successfully did with the help of Kubernetes powered by Palette.

Even then it kept using the public cloud, as a way to:

Test edge deployments without the risk of losing access to devices installed in remote locations.
Access additional capacity quickly in times of strong demand.

For both of these targets Kubernetes fits perfectly, enabling the manufacturer to keep bleeding edge technology in use to provide the latest and greatest platform to application owners as well as using locally owned hardware.

How Palette helps: deploy Kubernetes wherever makes sense!

Spectro Cloud’s Palette simplifies deployment and management of a Kubernetes estate in any environment, be it one of the major hyperscalers on your own premises. The beauty of our Cluster Profiles is that you can easily reuse them between environments. In other words, if you’ve configured your application stack using Palette’s profiles, you can easily apply them to any Kubernetes cluster, be it in the cloud or in your datacenter, simplifying your migration between clouds and from them back to earth.

The only catch here is that you should be using the same technology stack to control both on-premises deployment as your cloud part. We’ve seen most success when using Cluster API to manage both parts of the estate. If you haven’t heard of CAPI, it’s an open-source initiative that enables you to define Kubernetes clusters as Kubernetes objects. It has native interfaces with all major hyperscalers (AWS, Azure, Google), making it a strong choice for your cloud workloads.

What about managing Kubernetes on-premises using modern methods like Cluster API? Fortunately, that’s also quite easy: there's a CAPV provider out there for vSphere, and here at Spectro Cloud we wrote and open sourced a MaaS provider for Cluster API, contributing to the bare metal community.

We built our Palette management platform on top of the open source goodness of Cluster API (with a lot of extensions, of course). If you’re keen to benefit from the multi-environment portability that CAPI unlocks, Palette is the quickest way to get started. Get in touch for a demo and we can show you how.

Tags:

Cloud

Bare Metal

Concepts