With the job of software development getting more and more complex, it could be time for dev and ops specialists to separate once again. But can that be done without repeating the mistakes of the past?
Devops emerged hand-in-hand with the rise of agile methodologies and cloud computing in the late 2000s, as software started to eat the world. A neat portmanteau of “development” and “operations,” devops sought to bring together the two previously separate groups responsible for building and deploying software. It also coincided with, or even inadvertently pushed forward, the need for software engineers to tighten their user feedback loops and push updates to production more frequently.
While many organizations grabbed this opportunity to bring together two sets of specialists to solve common problems at previously impossible speeds, others took the rise of devops as license for developers to take responsibility for operations tasks and sought to build a super team of semi-mythical full-stack developers.
“Devs don’t want to deal with operational concerns, for the most part,” tweeted Devops for Dummies author and head of community engagement at Amazon Web Services, Emily Freeman.
Freeman clearly hit a nerve, with hundreds of replies pouring in from developers who also did not want to do ops.
“I am a dev and I don’t want to deal with operation concerns,” Scott Pantall, a software engineer at the fast food company Chipotle, replied.
“Devs and ops should work closely while having differentiated roles. The empathy between teams is the real point,” Andrew Gracey, a developer evangelist at SUSE, weighed in.
While the concept of shifting more operational and security concerns “left” and into the domain of software developers clearly has its merits, it also has the potential to create a dangerous bottleneck.
“If you pull devs into too many different areas you end up shooting yourself in the foot. They are different skillsets,” James Brown, head of product for Kubernetes storage specialist Ondat, told InfoWorld.
Or as Nick Durkin, field CTO at Harness, put it, “People are beginning to realize we wouldn’t hire an electrician to do our plumbing.”
A ‘massive’ increase in responsibilities
While the stock of enterprise software developers has never been higher, the specialized expertise of technical operations has somewhat faded into the background, even as their workloads have increased.
As devops engineer and former systems administrator Mathew Duggan wrote last year, while operators “still had all the responsibilities we had before, ensuring the application was available, monitored, secure, and compliant,” they have also been tasked with building and maintaining software delivery pipelines, “laying the groundwork for empowering development to get code out quickly and safely without us being involved.”
These expanding responsibilities involved a mass retraining effort, where cloud engineering and infrastructure as code skills became paramount.
“In my opinion the situation has never been more bleak,” Duggan wrote. “Development has been completely overwhelmed with a massive increase in the scope of their responsibilities (RIP QA) but also with unrealistic expectations by management as to speed.”
That pressure may be starting to tell.
“It’s incredibly challenging to build an organization that achieves this level of iterative harmony that lasts for a sustainable period,” wrote Tyler Jewell, managing director at Dell Technologies Capital in a research note. “As systems grow in complexity and the end user feedback increases, it becomes increasingly difficult for a human to reason about the impact a change might have on the system.”
Recognizing the problem
The situation may not be as hopeless as Duggan and others believe, though it may require a significant realignment of engineering teams and their responsibilities.
“The intention is not to put the burden on the developer, it is to empower developers with the right information at the right time,” Harness’s Durkin said. “They don’t want to configure everything, but they do want the information from those systems at the right time to allow operations and security and infrastructure teams to work appropriately. Devs shouldn’t care unless something breaks.”
Nigel Simpson, ex-director of enterprise technology strategy at the Walt Disney Company, wants to see companies “recognize this problem and to work to get developers out of the business of worrying about how the machinery works—and back to building software, which is what they’re best at.”
It’s important to remember that devops is a continuum and its implementation will vary from organization to organization. Just because developers can do some ops now doesn’t mean they always should.
“Developer control over infrastructure isn’t an all-or-nothing proposition,” Gartner analyst Lydia Leong wrote. “Responsibility can be divided across the application lifecycle, so that you can get benefits from ‘you build it, you run it’ without necessarily parachuting your developers into an untamed and unknown wilderness and wishing them luck in surviving because it’s ‘not an infrastructure and operations team problem’ anymore.”
In other words, “It’s perfectly okay to allow your developers full self-service access to development and testing environments, and the ability to build infrastructure as code templates for production, without making them fully responsible for production,” Leong wrote.
As Brown at Ondat sees it, container orchestration with Kubernetes is emerging as the layer between these two teams, separating concerns so that developers can focus on their code, and operations can ensure that the underlying infrastructure and pipelines are optimized to run it. “Let’s not rewind to those teams not speaking to one another,” Brown said.
In fact, according to VMware’s “State of Kubernetes in 2022” report, 54% of the 776 respondents said that better developer efficiency was a key reason for adopting Kubernetes, and more than a third (37%) said they want to improve operator efficiency.
“Don’t fall for the fallacy of trying to make everybody an expert,” Kaspar von Grunberg, founder of Humanitec, wrote in his email newsletter. “In high-performing teams, there are few high-profile experts on Kubernetes, and there is a high level of abstraction to keep the cognitive load low for everyone else.”
Devops is dead
If the era of devops is indeed coming to an end, or even if the gloss is just starting to come off, what comes next?
Site reliability engineering (SRE), which emerged out of Google when it suffered its own devops-related growing pains, has proved a popular solution.
“Fundamentally, it’s what happens when you ask a software engineer to design an operations function,” Ben Treynor, vice president of engineering at Google and the godfather of SRE, is often quoted as saying.
Take the two large financial institutions, Vanguard and Morgan Stanley, which have found it difficult to balance dev and ops responsibilities as they transition towards more cloud-native practices.
Inserting an SRE safety blanket at both the central operations level and within individual developer teams has helped both companies build confidence that they are striking the right balance between developer velocity and operational stability.
However, the SRE function has also drawn some criticism. Establishing SRE principles is “sometimes misunderstood as a rebranding of the ops team,” as Trevor Brosnan, head of devops and enterprise technology architecture at Morgan Stanley, observed.
“It’s a nuanced problem to solve,” Christina Yakomin, a site reliability engineer at Vanguard, said. “Introducing SRE does make people feel like we are siloing ops again into that role.” Instead, Yakomin wants to encourage Vanguard developers and operations specialists to share responsibility for security and ensure that teams with shared platforms take full operational responsibility for them.
Long live platform engineering
The idea of the internal developer platform, or the discipline of platform engineering, has also emerged as a way for organizations to give developers the tools they need, complete with the appropriate organizational guardrails to enable developers to do their best work.
An internal developer platform is typically made up of the APIs, tools, services, knowledge, and support that developers need to get their code into production, combined into a company-standard platform that is maintained by a dedicated team of specialists, or product owners.
“Devops is dead, long live platform engineering,” tweeted software engineer and devops commentator Sid Palas. “Developers don’t like dealing with infra, companies need control of their infra as they grow. Platform engineering enables these two facts to coexist.”
Brandon Byars, head of technology at the software consultancy Thoughtworks, says he often “sees that division working well in platform engineering teams, which look to remove friction for developers, while giving them dials to turn.” However, he adds, “Where it doesn’t work well is by asking developers to do all of that work without centralized expertise and tooling support.”
The balancing act between software development and operations teams will be familiar to any organization that has worked to implement devops principles across its engineering teams. It’s also a balancing act that is becoming increasingly high-wire in the age of cloud-native complexity.
Copyright © 2022 IDG Communications, Inc.