Technical debt is a term that refers to the negative impact of certain decisions made regarding technologies, architectures, processes, and shortcuts taken while developing software or complex information technology systems. Examples of technical debt include things like a lack of automation, the use of legacy programing languages, legacy software development practices, non “cloud native” architectures, as well as dependency on a single infrastructure vendor or public cloud platform.
As a solutions architect working in the fast-paced area of cloud computing and modern application development, I consult with very large enterprise companies on a regular basis to design solutions that aim to reduce or minimize the impact of technical debt. For the past five years I have been focused on helping organizations adopt modern application development methodologies and tools to stay relevant or gain competitive advantages in their business verticals. Containers and Kubernetes are an attempt to reduce or minimize the impact of the technical debt incurred from the past decades of technology practices.
Containers and Kubernetes are not just a shiny object or technology fad. Kubernetes is an open source project that was born from the need to develop, test, and run containerized - typically microservice-based - applications at extreme scale while taking full advantage of highly-automated computing platforms. The goal of Kubernetes is to abstract away the complexities of the infrastructure an application is running on, while offering developers a standardized environment to develop and run applications anywhere that Kubernetes can be deployed.
Kubernetes is highly portable in that it can be run pretty much anywhere, from a laptop, across various public and “private cloud” infrastructure, even on lightweight devices deployed to edge locations (eg. a single commodity server or even a raspberry pi), sometimes beyond the reach of stable internet connectivity. This is all good but Kubernetes itself is a complex container scheduling software that has a life cycle of its own that needs to be managed.
When a large enterprise company decides they would like to use Kubernetes, they should assess their maturity in a number of areas ranging from their software development life cycle, private datacenter vs. public cloud technology stacks, and infrastructure operations processes. So how can Kubernetes create technical debt? I would like to offer three main categories for consideration: talent and cultural technical debt, vendor lock-in technical debt, and “multi-everything” technical debt.
Technical debt due to talent and culture
The need for talent, in the form of individuals with the skills to develop, drive and maintain a comprehensive container strategy, presents a large barrier to entry for many companies. By only focusing on “keeping the lights on” many companies have already assumed a lot of talent and cultural technical debt. Organizations without strategic initiatives to develop their own talent, by training or investing in the next generation of technology trends and practices, are already at a disadvantage from the start of their Kubernetes journey. This is a tough barrier for companies to overcome but with the rise of DevOps practices and culture incubation projects there are a lot of companies making good progress in this area. The companies I have seen address their talent and culture challenges have taken steps to embrace decentralized IT teams with varying degrees of autonomy. Changes often need to take place with IT leadership trends to move away from layers of non-technical managers. Talented people need coaches that can help them develop their skills while providing opportunities to grow in their careers. Embracing a remote and dispersed workforce has now been brought front and center through the pandemic years. Talent retention should also be a major focus for companies along with succession planning to ensure sustainability for long term technical goals. In this case choosing to use Kubernetes does not cause a company technical debt, rather, their existing lack of talent and culture have to be addressed which can take time.
Technical debt due to vendor lock-in
Replace this with the following image: 53-images/image8.png
A natural response to long-term transformation initiatives is finding “shortcuts” along the way. This often comes in the form of looking for vendors that specialize in a particular technology with some type of out-of-the-box “quick win” solution. This approach leads to technical debt commonly referred to as “vendor lock-in”. This is where a company chooses to partner with a vendor to provide a solution while accepting a level of confinement or limitation. Examples of this would be using a public cloud’s managed Kubernetes service which offers consumers the benefit of reducing their own development and maintenance of various aspects of the Kubernetes lifecycle management, but also means forfeiting the company’s levels of control and customization of the overall Kubernetes solution. Don’t get me wrong, managed Kubernetes offerings are a great place to start a Kubernetes journey, but depending on a company’s use cases, the “hidden complexity” to begin with can bite back. Similarly, Kubernetes on-premises solutions may come with the promise of reduced operational complexity, however the other side of that is the lack of flexibility, control and customization. This can mean limitations on where you can run Kubernetes or the amount of work required to run outside the vendor’s “preferred environment”, as well as the technology mix “around” the core infrastructure stack (OS, monitoring, authentication, etc.) and any additional features (especially around Day 2 operations) required. Typically, that “quick win” solution initially outweighs these limitations, but over time this technical debt will prevent the scale or maturity of Kubernetes in the organization. In addition, vendor lock-in can often lead to painful migrations or retooling efforts to overcome the limitations of a solution that has reached the end of its usefulness.
Technical debt due to “multi-everything”
At this point it is very well understood that public clouds are a real thing and companies that learn to harvest their value are the ones that will survive long-term. However, it is not about just the value of one, but the combined use of more (including on-premises infrastructure) that can pay dividends in some cases. Kubernetes is a big driver behind the growing demand (by application teams) of a ubiquitous infrastructure platform that also includes data centers, co-lo sites, regional hubs and more increasingly, edge locations. But this… “multi-everything” approach has a technical debt risk that relates to what I like to call the “re-inventing fire” effect. Examples of this would be a company that has multiple ways of deploying and managing Kubernetes based on where they need to operate or even based on the individual needs of application teams and business units. Consider an organization that consumes managed Kubernetes services from one or more public clouds, owns an on-premises data center with a Kubernetes commercial solution, along with a “Do-It-Yourself” Kubernetes “home-grown” stack. Without a standardized process and tooling to manage Kubernetes across a landscape of multiple operating environments and conditions, the result is multiple teams each creating and maintaining “snowflake” processes and tooling or “re-inventing fire” again and again. “Multi-everything” technical debt is the most prevalent form of technical debt these days. Interestingly, this often occurs as a result of trying to reduce other forms of technical debt, by giving teams a level of freedom in operating and making decisions.. Creating autonomous teams with a “fail fast” culture is a worthy goal however with technology ecosystems like Kubernetes, there is a need to introduce some standard procedures for scaling in a repeatable and secure manner.
Practical next steps
Replace this with the following image: 53-images/image5.png
- Spread the knowledge internally
Technical talent and cultural transformation are areas that may seem monumental to address within your organization. Don’t feel overwhelmed as every person within a company can play a role in developing technical talent as well as defining a workplace culture. Simple steps can be taken such as forming a meetup group within your IT organization that encourages people from different teams to come together to explore a common topic such as Kubernetes. Sponsoring this type of event during working hours and having IT leadership prioritize employee participation by working events into a monthly or quarterly schedule can get a group like this off the ground. Rotating between different host teams can spread the load of one person or team being responsible for organizing, researching, and executing recurring meetups. Foster an internal community culture of collaboration through the use of wikis and messaging channels where people can share information specific to your organization’s own processes, lessons learned, and best practices.
- Be realistic and transparent on vendors and expectations
When it comes to vendor lock-in avoidance this is a fine line that all IT organizations walk. The main advice here is to be diligent when making vendor decisions and documenting the use cases that led to a particular vendor solution. A basic assessment of pros vs. cons for a chosen vendor solution will help guide future decisions and provide historical reference. The number one con to evaluate a solution by should be how much effort it would take to stop using the solution. In some cases the vendor roadmap may address the list of cons identified when forming the partnership or using a particular product. In other cases the needs of new use cases or projects may push the limits of a vendor solution both on the current capabilities and future roadmap. When this happens the full history of why a particular vendor solution was chosen is no longer a mystery and there shouldn’t be any concern around exploring new solutions. Too often in large organizations the business and technology sponsors of a particular vendor will dig in their heels and defend a solution decision when they don’t necessarily need to. When business and technology requirements change and priorities are rearranged, the scope of the original vendor solution can change drastically. Being able to tell a complete story through documentation of the conditions by which a vendor solution was chosen should move people past any need to “save face” or attempt to pursue a vendor solution that is not suited to meet the requirements of new use cases or projects.
- Ask the five Ws of “multicloud” K8s
These days infrastructure solutions are expected to work everywhere. The key here is making architectural and process decisions through a multi-operating environment lens. To avoid having teams duplicate efforts, look for areas where standardization can prevent a storm of snowflake processes and tools. When it comes to Kubernetes there is a common set of challenges that I refer to as the who, what, when, where, & why of K8s:
- Who is going to provision a Kubernetes cluster and vs. who is going to use it (are they the same person, is it a person, is it a pipeline, or is it both)?
- What should the Kubernetes cluster look like (OS, K8s distro, CNI, CSI, Auth, Security, Ingress, Load Balancing, logging, monitoring, service mesh, middleware, et.)?
- When will Kubernetes turn from a few clusters to a fleet of clusters?
- Where do you need to run Kubernetes (public cloud, private datacenter, on vm’s, on bare metal, at the edge)?
- Why do you need Kubernetes clusters (development, testing, QA, production)?Look for tools that will allow you to address as much of the five Ws with the least amount of effort while supporting a variety of processes.
Kubernetes and the open source ecosystem can truly transform the way organizations develop and deploy software, but it cannot happen with a flip of a switch. Beyond the technology itself, perhaps the most important and often neglected dimension, is the introspection it requires. I encourage you to also look across your organization and see if you agree that these types of technical debts could potentially exist within your current Kubernetes journey. Finally, take a look at this independent whitepaper on modern K8s requirements.