What observability means for cloud operations

Observability is one of those concepts being tossed about these days in the tech press and at cloud computing conferences. Everyone has a definition of what it is and how it’s used. No two are the same.

Observability seems to be mostly defined as the ability to determine key insights from a great deal of data. Observability as related to cloud operations (cloudops) normally uses data that’s being extracted from running systems. We use this data not only to determine if something is going wrong, but to figure out why and how to fix it.

What’s the value of observability as a concept, and how is it of value to cloudops? Let’s break it down into components that allow enterprises to dissect observability into ways that return value back to the business:

Trends: What patterns occur over time and what do they mean for future behavior? For example, if performance trends downward, that indicates likely I/O problems that arise from organic database growth. This is based on historical and current data, which is used as training data for an artificial intelligence system such as AIops. 

Analyses: What does the data mean, and are there any insights we can draw from it? Observability provides the ability to analyze what the data means. This is a core feature that sets it apart from just monitoring the data. 

Insights: What can we understand from the data, or what do we need to understand? This involves finding meaning in the data that’s not readily understood or apparent. For example, are there correlations between a rise in sales revenue and a drop in overall system performance? 

Tracking: Can we monitor systems activity data in real time or near real time and leverage this data to find, diagnose, and fix issues ongoing? Traditional tracking monitors the activity of multiple systems in the cloud and in the data center. Under the concept of observability, the system can find dynamic insights from real-time data and look at it in the context of related operations data. 

Learning: Learning systems look at massive amounts of data to find trends and insights and then use that data to learn about emerging patterns and what they mean. Any system that embraces the concept of observability has AI systems to train knowledge engines around patterns of data. 

Alerting: What issues need to be dealt with in a timely manner? For example, a low-priority alert for a network performance issue will eventually lead to the replacement of a network hub. Or an immediate alert requires immediate attention, such as automatically expanding capacity because an application processing load is nearing the limits of a virtual server cluster in the cloud. 

Actions: What happens because of an alert? It could result in a manual action, such as rebooting a cloud-based server, or an automated action, such as kicking off very sophisticated processing to automatically recover from a ransomware attack before there is an impact on core business systems. Complex actions may involve dozens of actions taken by humans and thousands of automated actions to carry out immediate self-healing operations. 

Observability allows you to manage and monitor modern systems and applications built to run at faster velocities with more agile features. It is no longer good enough to deploy applications and then bolt on monitoring and management tools. The new tools must do so much more than simply monitor operations data. That’s where observability comes in, and it should be understood by anyone charged with cloudops. Perhaps that’s you.

Copyright © 2022 IDG Communications, Inc.


Leave a Reply