In recent years, software development and production environments have become increasingly complex and distributed. In response to this complexity, an approach called “observability” has emerged, which seeks to provide deep visibility of running systems and applications. It relies on a variety of techniques and tools, but one of the cornerstones of this approach is metrics. . In this article we will look at what metrics are, how they are used and how they are implemented in a microservices environment with tools such as Micrometer, Prometheus, Elasticsearch and Metricbeat.
What is observability?
It can be defined as a methodology that seeks to understand and analyse complex systems, enabling development and operations teams to identify and resolve problems more quickly. Unlike the traditional “monitoring” approach, which focuses on monitoring the state of systems, observability is based on the ability to obtain detailed information about the internal behaviour of systems in real time. It provides operations and development teams with deep visibility into what is happening within their systems, allowing them to identify problems, optimise performance and improve the user experience.
Following the outline below, let’s take a look at the different elements that make up a typical metrics-focused observability flow:
The role of metrics in observability
Metrics are key elements in observability. They are quantitative data that provide information about the health and performance of systems and applications. By collecting and analysing relevant metrics, teams can gain an objective and measurable view of how their systems are performing.
The following screenshot shows an example of metrics exposed on an endpoint thanks to the Micrometer library, in Prometheus format:
Metrics are used to monitor and measure critical aspects of the system, such as latency, network performance, resource utilisation and other key performance indicators (KPIs). By focusing on the right metrics, teams can identify patterns, trends and anomalies that can affect system performance and availability.
Implementing metrics in a microservices environment
In a microservices environment, metrics instrumentation becomes essential for effective observability. One of the popular tools used for this purpose is Micrometer, a library that enables the collection and export of metrics from Java applications to a variety of monitoring systems, including *Prometheus With Micrometer, teams can record relevant metrics directly from their applications and send them to a centralised system.
*Prometheus is a monitoring and alerting system that has become a popular choice in microservices environments. . It allows the collection and storage of metrics in real time, making them easy to visualise and analyse later. With Prometheus, teams can define alerting rules based on metric thresholds and receive notifications in case of anomalies.
To further leverage the metrics collected, teams can use tools such as Elasticsearch and Metricbeat. Elasticsearch is a search and data analysis engine that allows large volumes of metrics to be stored and queried. Metricbeat, on the other hand, is a lightweight agent that collects and sends metrics to Elasticsearch. By combining these tools with machine learning techniques, teams can detect patterns and anomalies in metrics data and take preventative action before problems become system outages.
The Micrometer integration
In a microservices environment, Micrometer’s integration with deployed services and applications is key to collecting relevant metrics. . By providing an easy-to-use API that allows developers to record metrics in their applications. These metrics can include information on request latency, CPU and memory utilisation, number of requests processed, among other critical aspects of performance. By logging these metrics directly into applications, you gain granular visibility into the internal behaviour of each component.
Pometheus and real-time metrics collection
Once metrics are recorded in applications via Micrometer, a centralised monitoring system is required to collect and store them. This is where Prometheus comes in. It is specifically designed for real-time metrics collection and storage. It provides a flexible query interface that allows teams to analyse and visualise the collected metrics.
In the screenshot below we can see a graph showing the memory usage of the pods that make up a particular microservice:
As we have seen, teams can create customised dashboards, define threshold-based alerts and perform in-depth analysis of metrics to identify patterns and trends in the data collected.
Elasticsearch and Metricbeat: Expanding analytical capabilities
While Prometheus is an excellent real-time monitoring system, sometimes more advanced analysis is needed. This is where Elasticsearch and Metricbeat play an important role. Elasticsearch is a highly scalable search and data analysis engine that allows you to store and query large volumes of metrics. Metricbeat, on the other hand, is a lightweight agent that efficiently collects and sends them to Elasticsearch.
The following screenshot shows a graph in which we have used Machine Learning techniques to detect possible anomalies in a given metric of a service:
With Elasticsearch and Metricbeat, teams can perform more complex analysis, such as anomaly detection and long-term trends. By using machine learning techniques, such as anomaly detection algorithms using Machine Learning, teams can identify patterns and anomalous behaviour in metrics. This allows them to take preventative and proactive measures to avoid problems before they affect system availability and performance.
In a microservices environment, observability is critical to ensure the performance and availability of systems and applications Metrics play a crucial role in this approach, providing an objective and measurable view of system behaviour. By implementing tools such as Micrometer, Prometheus, Elasticsearch and Metricbeat, teams can efficiently collect, store, visualise and analyse metrics, enabling them to identify issues, optimise performance and ensure a quality user experience.
Observability and metrics analysis have become indispensable components in the management of distributed and complex systems. Their adoption is essential to maintain robust and reliable systems. With these tools, teams can have a deep understanding of their systems and be prepared to face challenges that may arise.
Álvaro Sola Martínez
Marcos Izquierdo Caro