Enhancing edge K8s security and operations with Palette 4.4

Simplifying edge security — so hot right now

In Kubernetes as in all things IT, security is always top of mind. No surprises there.

In our latest 2024 State of Production Kubernetes research, 76% of Kubernetes users said their adoption has been inhibited by concerns about Kubernetes security.

When it comes to edge K8s specifically, security has been the undisputed number one challenge reported by adopters for three years running.

The second-ranking challenge? Concerns about performing day 2 operations at the edge, closely followed by fears about the cost of field engineering visits.

These challenges are only becoming more intense as edge Kubernetes deployments grow. The pressure is on to make edge in production safe and sustainable.

Fighting the good fight with Palette 4.4

With Palette 4.4, just released, we’ve made a leap forward in helping you secure and manage your edge deployments.

We're introducing trusted boot and full disk encryption (FDE) for greater security, and expanding our Local UI feature for simplifying operations in environments with connectivity challenges.

Watch the video for a summary, or keep reading for the details.

Trusted boot and FDE: the keys to tamper-resistant edge

Edge computing devices are often deployed in remote or unmonitored locations such as warehouses, industrial sites, and outdoor installations, or retail locations with high foot traffic, which makes them vulnerable to physical tampering and unauthorized access.

How can you protect the integrity and confidentiality of the data stored and processed by these devices?

Encryption sounds good, but not if the cure is worse than the disease

If you said ‘encryption’, you’re on the right track — but there are challenges with solutions like encryption if implemented piecemeal.

For example, disk encryption requires some form of key to unlock the data, whether it’s a password or a token like a USB key.

Imagine there’s been a power outage in the middle of the night at an unmanned industrial location miles from anywhere, and the edge cluster is sat waiting for a PIN before it’ll boot. Meanwhile, critical applications are offline.

How long is it going to take Dave the Kubernetes engineer to drive over there with a keyboard and screen? Too long. So admins are often forced to accept security compromises in order to ensure operational continuity.

Secure boot is valuable, but limited

Another mechanism you might suggest to guard against tampering is secure boot. Again, good start, but it’s not enough.

Secure boot ensures that the firmware will only load software that is signed and trusted during the boot process. But it doesn’t help in scenarios where an attacker physically accesses the hardware to manipulate the boot process or replace critical components to bypass security checks. In such cases, the system’s trust chain can be compromised.

The brave new world of trusted boot

When you’re setting your requirements for edge Kubernetes security, you shouldn’t have to trade off safety versus operational continuity. Sum up the requirements and you’d end up with a list like this:

Headless and self-contained operation, no need for human interaction such as password entry, and no need for a connection back to any centralized service
Protection for sensitive business data, with resistance against hands-on edge hardware tampering
Zero impact to device lifecycle management

We believe that our new implementation of trusted boot in Palette Edge meets these requirements, without compromise.

So what is trusted boot?

Trusted boot, as we define it, is three things working very closely together, combined with Unified Kernel Images (UKI):

Secure boot: Ensures that the device firmware only loads signed software during the boot process.
Measured boot: Ensures that the system state is untampered with, by using a Trusted Platform Module (TPM) to record measurements of the boot process and compare them against expected values.
Encryption: Only decrypts business data if the boot measurements pass integrity checks, with encryption keys managed by the TPM.

Shining a light on the TPM

The Trusted Platform Module, or TPM, is at the heart of making trusted boot possible — so let’s explore it a little.

First, it’s a hardware component, implemented as a separate chip on the motherboard, though it can also be integrated into the system on a chip (SoC).

implementation of hardware chip components

The TPM ensures the integrity of the system at boot time and safeguards sensitive information through cryptographic functions.

It’s an integral component of trusted boot because it’s the part that ties boot measurements to data encryption.

Inside the TPM, completely safe from tampering, lives a policy. The policy states that only “valid, signed measurements” are acceptable.

It doesn’t actually specify the exact measurements it’s looking for, because that would make upgrades (day 2 operations) incredibly difficult and brittle, opening the door for edge hosts to be easily bricked by accident.

Instead, when administrators build upgrade images, they precalculate measurements for the new image and then sign them with a special key. The policy stored inside the TPM includes knowledge about how to recognize signed measurements, and so those measurements are actually shipped alongside the upgrade images themselves.

After an upgrade, at boot, the system will take new measurements and store those securely in the TPM (so they too cannot be meddled with). The TPM can then check if those new measurements match the shipped, precalculated measurements.

Remember, those precalculated measurements were shipped with the upgrade image, but they were signed and the TPM knows how to recognize them.

If you’ve been wondering how the TPM was originally made aware of this policy, that is done during the initial installation of Palette Edge software, in a secure location such as a staging environment. From then on, the TPM knows how to trust any set of measurements.

If that’s a bit confusing, then we have an analogy for you: passport control.

In this analogy, a border control officer has been trained to verify passport documents. As part of the training, they learned to look for specific signatures, for example those shiny colorful watermarks. In doing so, they are checking that an authority has stamped this document as being valid, and the contents of the document should be trusted.
‍

The contents of the document include measurements such as height or eye color. The officer can compare those measurements against the person standing in front of them to see if they match. The border control officer has never seen this person before, and the person carried their own document with them, but the officer is able to establish a chain of trust by checking the authenticity of the document.

This is exactly how our trusted boot solution works. The TPM is like the border control officer. The authority signing the passport is like the administrator that signed the images, and the person trying to gain entry is like the booting image.

And the outcome is the same: if the measurements don’t match, you don’t get access.

UKI: the safest way to boot operating systems

There’s another key piece to the puzzle, and that’s the UKI, or universal kernel image.

A UKI is essentially a single file, a ‘fat binary’, that contains an entire operating system including the kernel, boot configuration, and any initial files required to boot such as drivers, firmware (often called initrd or initial ramdisk) — and indeed any other content you want to include if you build your own custom UKI.

Because the entire OS is in a single UKI file, booted as one by the firmware, the entire OS can be measured by the TPM at the moment of boot to make sure it is trusted. That’s a huge security advantage when you understand the alternatives.

By comparison, a traditional boot technique, like the Grub bootloader, has a multi-stage boot process. A measured boot check can typically only measure the kernel and initial boot parameters — which means if other files later used by the OS have been tampered with, the security threat won’t be detected, and the disk will have been decrypted already.

With Palette Edge’s UKI-based trusted boot implementation, nothing can be tampered with without the TPM noticing it and preventing access to the encrypted data.

No-compromises security with simple lifecycle management

Our trusted boot solution meets all of the requirements we mentioned earlier:

Headless operation: booting the device is fully standalone, requiring no human intervention, no external hardware, and no network connectivity back to any centralized platform.
Tamper resistance: thanks to measured boot and UKI, tampering with Palette Edge devices is essentially impossible.
Simple lifecycle management: whoops! We haven’t covered that …

Palette has always had a major focus on simple lifecycle management from day 0 through day 2. And we had lifecycle management at the front of our minds while building this new trusted boot security capability.

Using Palette’s EdgeForge workflow, it’s easy to securely and reliably manage the security keys that you need to sign images, generate measurements, and sign those measurements, from the same workflows you use to generate and manage edge OS images for your device fleet.

When you generate a signed image to use to upgrade a device, you continue to benefit from Palette’s immutable, atomic upgrades, with A/B failover in the event of an upgrade failing.

And every day you benefit from Palette’s enterprise grade fleet management capabilities, for tasks like observability, configuration management, backups and security scans, across your edge fleet.

Local UI enhancements — still no YAML

How can you manage hundreds or thousands of edge clusters when they are disconnected from the internet, whether by circumstance or design?

Well, as one respondent to our 2024 research told us, it’s simple: you blow the IT budget by sending K8s experts on roadtrips to sit and parse YAML on site:

“Deploying and managing clusters in remote locations with poor or no network is challenging, expensive and error prone, requiring highly-skilled engineers.”

We believe there’s another approach. You can’t necessarily avoid the road trip, but you can send a less expensive resource.

That’s the idea behind our Local UI feature, which we introduced in Palette 4.3. It addresses the issue of managing remote, disconnected or poorly connected clusters, where the only way to get data in and out is by physically moving it there, on a USB stick. Local UI basically lets you manage these types of clusters with the same familiar Palette interface… just without the network connection.

We put a lot of work into making the UX simple — no YAML required! That means field technicians are less likely to make errors, they can deploy faster, and they need less training and expertise.

Day 2 upgrade improvements

In Palette 4.4, we’ve upgraded Local UI’s day 2 operations, benefiting both platform engineers and field technicians. Platform engineers have now gained better design-time capabilities, such as cluster profile variable usage counts, deletion protection, previews, and more.

Platform engineers have now gained better design-time capabilities

We’ve also added the ability for a field technician to see incoming changes to cluster profiles and cluster variables.

Imagine that the central platform engineer has introduced changes to a cluster profile, and it requires the field technician on site to review the changes. When loading the new content onto the cluster, the field technician is now shown a simple UI review screen where they easily retain any site specific overrides that may have been entered before, or accept new defaults coming from the platform engineer (again, no YAML!).

a simple UI review screen where they easily retain any site specific overrides

Improved Local UI diagnostics

In 4.4 we’re adding enhancing diagnostics and troubleshooting, with a new Diagnostics area added to the UI. From here, technicians can initiate connectivity checks using ping and traceroute (important during initial setup), and download bundles of logs for later analysis by platform engineers during troubleshooting.

enhanced diagnostics and troubleshooting

Automating cluster certificate renewals

By default, Kubernetes clusters become inoperable after one year, when their certificates expire. At the edge, this invariably leads to service outages unless a trained field engineer visits each and every site to renew them.

Palette 4.4 now completely automates this process. It renews certificates 30 days before expiry, eliminating the need for those site visits. In case any field technician does happen to be visiting in this time frame, we also provide informational notices about the upcoming renewal, with details such as issue date, expiration date, auto-renewal date, and even an option to manually update, just in case.

Learn more and give us your feedback

So there you go, two major new features in Palette 4.4 that you need to know about. But there’s more great stuff in this release than we can touch on in this blog. If you’d like to learn more, check out the release notes for full details, or join us on our community Slack if you have any questions, feedback or feature requests.

If you’re an existing Palette customer using our multi tenant SaaS, you’ll already have all these features today.

Not yet a customer? If you haven’t seen Palette in action for a while, it’s time to explore what’s new. Get in touch for an orientation demo and free access.

Thanks for reading, and we’ll see you soon for the next version of Palette!

Tags:

Edge Computing

Security

Using Palette

Cloud

Research

Enhancing edge K8s security and operations with new Palette 4.4 release