DevOps and Kubernetes: We’ve Been Doing It Wrong

Platform engineering as a replacement for DevOps has become a hot topic, with provocative critics stoking the controversy by pronouncing DevOps dead.

The underlying reason for these pronouncements is that the once-radical DevOps model is at odds with the new cloud-native container management model to which the now-obsolete DevOps model is being applied. Let’s take a closer look.

A Misapplied Model

Container orchestration platforms and DevOps rose in popularity around the same time. DevOps was born because old centralized platforms like Java EE no longer worked for developers looking to leverage newer languages and development frameworks. When developers wanted to try new programming languages like PHP and Ruby, they weren’t able to run those applications on Java EE. This is the context that bred the “you build it, you run it” mentality that is the foundation of DevOps.

The original goals of DevOps were to shorten the innovation cycle, increase agility, and ship more software faster by using automation and removing the Dev-to-Ops wall. However, the emergence of container orchestration platforms turned this shared ownership model upside down.

DevOps encourages decentralization, while container orchestration platforms were designed to be centrally managed for maximum benefit. Container orchestration platforms have carefully designed APIs that separate the concerns of developers and operators.

The original idea behind container orchestration platforms was that a central team provides a secure and resilient platform that abstracts away the complexity of infrastructure so each product team could focus on shipping and improving their products as fast as possible, without having to worry about how to operate them reliably, securely, and efficiently.

This is completely at odds with DevOps. In a sense, DevOps is trying to achieve the opposite of what container orchestration platforms were designed to do. Companies that try to take a maximalist approach with DevOps and encourage every team to build and run their own infrastructure will struggle and won’t get the full value these platforms have to offer. They’re doing it wrong.

A Hotbed of Inefficiency

The mismatch between traditional DevOps and the new cloud-native container orchestration model breeds a host of problems:

Overwhelming complexity. DevOps teams can’t keep up with the amount of work required to manage a secure cloud-native platform. Kubernetes itself is already complex, but to create a production-grade Kubernetes-based platform requires many add-ons to cover functionality such as security, observability, service mesh, applications, and more, with each bringing their own complexity. By having to tend to a complex infrastructure, developers are left with little time to develop applications. Projects are delayed or never make it to production.
Duplicate and disjointed effort. Every DevOps team comes up with their own way to deploy and manage infrastructure, essentially reinventing the wheel. This lack of consistency and standardization wastes resources and undermines the goals of achieving efficiency and reducing costs. This siloed approach also prevents teams from learning from one another. For example, if one team fixes an important issue, other teams don’t benefit from it.
Security and resiliency issues. Security and reliability engineering require specialized skills that many DevOps teams don’t possess. This leads to insecure and unstable infrastructure.
Manual coding errors. DevOps team spend a good deal of time building and maintaining brittle custom scripts for infrastructure management that leave lots of room for human error and are expensive to maintain.

In this environment, cloud-native projects stall or fail, exacerbated by the complexity introduced by multi-cloud, hybrid, and edge environments, and compounded by complex workloads like artificial intelligence (AI) and machine learning.

These problems exist even when you’re using a managed cloud-provider Kubernetes service. The services provide mainly bare bones Kubernetes and offer limited amounts of automation and fleet management capabilities. DevOps teams still need to add a multitude of Day-2 add-ons to create a production environment, and they must build their own automation for operational workflows.

Platform Engineering: A Better Path

Platform engineering is a new name for an old concept that has gained new relevance in the cloud-native era as a cure for cloud and cluster sprawl, wasted resources, and runaway costs.

Containers and WebAssembly (WASM) provide a clean interface, enabling developers to select any language and framework of their choice (unlike Java EE), while enabling a central team to set platform standards and governance.

A clean Kubernetes container-management interface separates the concerns of Dev and Ops, yielding more efficiency and productivity. I know what many developers are going to say: “Platform teams are just going to take our toys away again, take forever to give us something, and will just slow us down and make our lives miserable.” But I think it’s different this time around. Why? Because there’s a simple contract, and the contract is that as long as it fits in a container, it can be deployed.

Cloud-provider Kubernetes services were designed for the DevOps approach. Teams that want to provide an internal developer platform (IDP) for their entire organization need a Kubernetes management platform that provides fleet management capabilities for their entire Kubernetes fleet, whether those clusters are provided by a cloud service like Amazon EKS, Microsoft AKS, or are running somewhere outside the cloud.

Platform Engineering Best Practices

To reap the full benefits of platform engineering, certain processes need to be centralized, while others should be decentralized. The processes that should be centralized and standardized include:

Cluster lifecycle management. There is no value in having a dozen different ways to bring up a cluster. There is value in having a single way because it makes adding new infrastructure providers (like another cloud service) that much easier.
Security. Maintaining a secure Kubernetes environment requires a specialized skill set that is in short supply. Putting security experts on every team is not cost effective. Centralization is more efficient, enabling shared services like databases (database as a service, or DBaaS) to be secured and properly managed.
Governance/policy management. The point of policy is to be consistent across environments.
Observability. Ops teams need a “god view” of all their environments. This is critical for debugging and optimization. You want all your clusters to run like your best cluster.
Continuous delivery infrastructure. Best achieved through declarative APIs and GitOps.
Cost management. Best achieved through FinOps and integrated monitoring and management tools. Processes that are better off decentralized and left for developers to decide include:
Choice of programming language
Choice of development framework
Basically anything that goes into a container

A Golden Path to Innovation

Platform engineering provides the best of both worlds in giving DevOps teams a centralized platform approach and decentralized DevOps. The Kubernetes API and containers provide a robust interface that enables division of labor and enables both sides to focus on what they do best.

So is DevOps really dead? Not at all! DevOps concepts such as automation through continuous integration and continuous delivery (CI/CD), site reliability engineering (SRE), and DevSecOps are still considered best practices for product teams. But when it comes to providing a secure, resilient, and cost-effective platform on which multiple teams can deploy their apps, a platform engineering approach makes more sense than the shared ownership model for which DevOps advocates.

My company D2iQ provides a Kubernetes management platform that embodies the principles described in this article and allows teams to instantly deploy a production-grade platform on any cloud, datacenter, or the edge. Learn more at https://d2iq.com/.

Originally published at https://www.devopsdigest.com.