How eBPF is shaping the future of Linux and platform engineering

When Docker burst onto the scene in 2013, Linux containers seemed like an overnight success. But the evolution to containers—and microservices and Kubernetes—was actually decades in the making, based on kernel primitives in the Linux operating system. Docker used these primitives, namely cgroups and namespaces, as building blocks to create a lightweight, easy-to-use software packaging format. Linux containers had been used by Google and others [the cognoscenti?] for many years, but Docker made them easily accessible to mainstream developers.

And that’s what we’re seeing today around eBPF—another technology born out of Linux kernel primitives. Every major networking, observability, and security vendor is making claims of “eBPF-powered” offerings today. eBPF tools like Cilium, Tetragon, and Falco are becoming entrenched in enterprise architecture and cloud service provider offerings alike. And it’s just the beginning for eBPF-based breakthroughs, according to one if its creators.

InfoWorld spoke with Daniel Borkmann—co-creator of eBPF and current eBPF co-maintainer for the Linux kernel—to learn more about the origins of the technology, why eBPF has emerged as the standard approach to programming and customizing the Linux kernel, and what that means for the future of Linux and platform engineering.

From Solaris student to Linux kernel maintainer

Daniel Borkmann’s path to eBPF began with a quest to understand the internals of Solaris, which was still being taught in C.S. curricula at his university. A major hurdle, however, was the lack of source code to see “where the magic happens.” Borkmann found the theory in operating systems classes to be highly interesting, but the light bulb really went off for him during his late nights studying the Linux kernel source code, Git logs, and mailing lists. He began writing low-level user applications that interfaced with the kernel.

Soon Borkmann was exploring packet filters, tcpdump and libpcap, and how the network stack works when packets traverse the different layers coming and going. He wrote a more efficient tcpdump clone in his spare time and started sending small code improvements to the Linux networking stack. At the start of his Master’s studies he eventually got his first paid gig developing Linux kernel code for a local startup in Leipzig, Germany.

Borkmann submitted his first patch to the Linux kernel in 2010 as a “complete noob” (his words) to extend netpoll for allowing the execution of multiple rx_hooks per interface, and accidentally introduced a bug that would have caused a deadlock in the kernel, where it was quickly discovered and fixed by another contributor. But he was hooked. Linux kernel development was a fascinating environment that he knew was his calling.

Borkmann moved to Zurich to complete his master’s thesis on developing a composable networking stack for the kernel. Drawing inspiration from FreeBSD’s netgraph, his experiment was to try to offload networking blocks onto an FPGA and to build composable graphs for packet processing. But along the way, he sometimes found academic papers too dull with too little long-term, real-world impact and realized just how much more rewarding it would be to contribute to the Linux kernel full-time. He discovered a Linux contributor named Thomas Graf (eventually both became co-creators of Cilium) whose email had a Swiss domain (.ch), spontaneously reached out to him—and was invited to join the Linux kernel networking team at Red Hat.

And now Borkmann is one of the world’s top 1% of contributors to the Linux kernel.

Rethinking networking in the Linux OS

The origin story behind eBPF really begins in 2011, when software-defined networking (SDN) was gaining steam and Linux adoption was spiking. Linux subsystems needed to keep up with the new paradigm of microservices architecture and distributed applications, which run across clusters of Linux machines rather than on a single server and host operating system.

Borkmann’s work on kernel development in the networking stack put him on the front lines of meeting SDN and cloud-native networking requirements. Linux needed newer abstractions, because many of its building blocks were designed more than 10 years ago—cgroups (CPU, memory handling), namespaces (net, mount, pid), SELinux, seccomp, Netfilter, Netlink, AppArmor, Auditd, Perf, etc. And Borkmann saw technologies like netfilter’s nftables being pushed as “next generation” Linux networking, as well as Open vSwitch (OVS), which at the time was the most progressive SDN project. He believed there was a better approach.

The Linux kernel already was being stretched to keep up with higher networking speeds, but didn’t provide enough flexibility for programming new, custom functionality. Another constraint was the mandate to “never break user space.” That is, the Linux kernel must continue to support all of the software developed long before cloud-native applications arrived on the scene. Unfortunately, that “legacy baggage” moved some of the networking innovation from the kernel towards user space.

In short, the new cloud operating models brought much more automation, churn, and scale, and more demanding network performance requirements. But the self-contained subsystems in the Linux kernel had no convention for pushing, aggregating, and acting upon all of this new cloud context in the kernel.

In Linux programming, packet processing—parsing, manipulation, filtering, and forwarding—is a ground zero foundational concern for “what’s possible.” This is the mechanism for how kernel developers route, control, and inspect network packets as they travel through the stack. Packet processing is to the kernel’s networking stack what the carburetor is to an engine, the Flux Capacitor to Doc’s DeLorean.

Application developers mostly write their applications in user space, using abstractions that protect them from system calls that need to be made to the kernel. So, when an application needs to interface with hardware—writing to the screen, writing to a file, sending a network packet—it has to ask for help from the kernel. User space can’t do this directly (for various reasons, such as system security). The kernel provides the common, generic interface between user space applications and the hardware, and coordinates multiple user space processes that are running simultaneously.

In the evolution from virtualization to containers, many different approaches to packet filtering competed for a place within the Linux kernel: iptables, nftables, OVS, Linux Traffic Control (TC), and more. eBPF won out as the preferred approach because of its expressiveness combined with safety by the verifier (while executing programs with native performance). In other words, eBPF allows users to program the kernel in ways that are not possible with these alternatives and that do not risk crashing the kernel.

A more ‘programmable’ Linux kernel

While Borkmann was initially drawn to eBPF for the flexibility and performance it would bring to networking, it became obvious that the benefits of the new technology could extend far beyond just networking.

“Once eBPF brought in this base functionality where you can build stuff and deploy it immediately, it solved a huge problem,” said Borkmann. “You can write your orchestration programs with eBPF embedded in it, and deploy it no matter what the underlying kernel version is. And instead of paying a lot of money to a big vendor for core kernel ABI stability, now you can just use eBPF instead of needing a module to extend the kernel for a lot of different use cases.”

eBPF turned into a universal assembly language that allows users to load and safely run custom programs within the Linux kernel—a way to add all kinds of capabilities to the operating system at runtime. It is strictly typed, it has a stable instruction set, and its extensions are backwards-compatible.

“Think of eBPF as a new type of software which bridges the gap between a typical monolithic kernel and microkernel,” Borkmann explained. “It’s a safe extension of the kernel from your trusted user space. And the great thing about eBPF is that it’s as fast as regular kernel code given eBPF is not a sandbox but the program is fully understood by the verifier to determine whether it’s safe to run in a trusted environment, and then JITed [just-in-time compiled] to native code.”

Not only is eBPF safe and fast, operating at native speed. It’s extremely flexible, allowing different users to use it in different ways. “The power of eBPF is really in that you can enable code from a user point of view only when you as a user have that use case or need to process something in a certain way,” Borkmann said. “It doesn’t penalize others. It’s not like something that’s hard-coded in the kernel that would make the critical path slower and slower—the performance death by a thousand cuts.”

“Prior to eBPF, most users consumed enterprise Linux distributions or just ran whatever kernel version that came installed on their device,” Cilium’s Graf said. “eBPF changed this fundamentally, as with the presence of the runtime, any idea could be turned into an eBPF program and loaded at runtime within days instead of years. This meant we could rebuild everything better. We had to decide what to rebuild first.”

Kernel engineering goes mainstream

Like Google Borg and other technologies born at hyperscalers, eBPF initially was adopted by only a handful of software engineering shops who possessed kernel development skills. Not many developers have the requisite low-level C programming skills to do kernel engineering and write eBPF programs.

But today that small number of experts are writing programs that are touching millions of users. eBPF-driven programs are the most exciting turf for platform engineering teams that are responsible for networking, security, and observability, and many who use these programs do not need to know anything about the underlying eBPF abstractions that make them possible. “Think of it as a silent platform revolution from cloud native,” as Borkmann noted in a recent keynote at a workshop on eBPF.

Here is a glimpse of the many applications in the vast eBPF landscape:

Cilium began as an eBPF-based implementation of the Container Network Interface (CNI) to provide Layer 3 and Layer 4 connectivity between container workloads, but evolved to become the de facto network layer for most of the cloud service providers’ Kubernetes offerings. Among other features, Cilium implements distributed load balancing for traffic between Kubernetes pods and to external services, and is able to fully replace kube-proxy, using efficient hash tables in eBPF for almost unlimited scale. It also supports advanced functionality like Layer 3 through Layer 7 policy enforcement, integrated ingress and egress gateways, bandwidth management, a service mesh in combination with Envoy, and deep network visibility.

Tetragon is another eBPF program that provides security observability and runtime enforcement. By exploiting eBPF’s low overhead, Tetragon allows platform teams to tie network flows and other in-kernel events to Kubernetes objects—labels, pods, namespaces—down to very specific processes and their related process tree. In the wake of software supply chain security exploits like XZ Utils, Tetragon is an open source project that aims to give platform teams deeper ways to find where specific software is running in their environments and take specific policy actions at the kernel level.

Pixie is an observability tool that uses eBPF to “automatically capture telemetry data without the need for manual instrumentation.” It has become a popular building block for next-generation application performance management and monitoring vendors. A simple Google search for “observability AND eBPF” shows how much the technology is transforming the telemetry data richness that is made possible by the performance of eBPF. Inferring the real-time state of cloud-native systems has historically involved piling up monitoring data that has to be correlated in the future. Bringing this telemetry data collection closer to the kernel promises much more consistency and lower resource usage.

Katran is a C++ library that could challenge the status quo of proprietary Layer 3 and Layer 4 load balancers with a new approach built on in-kernel packet processing. Not everybody can create eBPF programs, but the programs that are being created are targeting arenas that have been relatively stagnant in enterprise infrastructure, and crying out for modernization for cloud-native use cases.

“The next decade of infrastructure software will be defined by platform engineers who can use eBPF and the projects that leverage it to create the right abstractions for higher-level platforms,” said Borkmann. “Pushing cloud-native context into the kernel was missing, and eBPF solved it.”

As we mark the 10-year anniversary of Kubernetes this month, we’re still in the early days of distributed applications, container orchestration, and platform engineering. Few may directly engineer eBPF at the kernel level, but millions will use eBPF-based programs. And if you’re running workloads on Kubernetes on one of the big public cloud provider platforms, it’s likely that you already are.

Copyright © 2024 IDG Communications, Inc.

Source