Today we are talking about Beats; components that, when used correctly, will save us time and complexity in the extraction of data from our business processes.
What are the Beats??
Beats are the native data extraction agents of Elasticsearch, they are used to extract and introduce data for all the functionalities that Elasticsearch offers us once they are installed in the desired infrastructure.
Where are they located in the Elastic Stack?
At the Elasticsearch Stack level, the Beats are in the lowest layer, that is, they are in charge of collecting data from the different technologies from which we need to collect valuable information for our technical or functional process monitoring.
- Beats are not always used as the starting point for data reading, since Logstash also allows integration with an infinite number of inputs.
- Before extracting data from a technology, it is necessary to analyze which strategy will be used for the ETL process of that process.
What types of Beats are there?
As of today, the official Elasticsearch Beats are the following:
- Auditbeat: Reads data from the Linux audit framework (auditd) and checks file integrity.
- Metricbeat: Extracts system and application metrics. Integrates natively with many technologies thanks to pre-existing modules.
- Filebeat: Collects system and application logs. Integrates natively with many technologies thanks to pre-existing modules.
- Winlogbeat: Same functionality as filebeat to read Windows systems events.
- Packetbeat: Monitor your infrastructure network traffic and get metrics such as latency and errors, response times, patterns and user behavior trends.
- Heartbeat: Measures the uptime of a system or application actively with monitoring probes.
APM Server, although not a Beat, is also developed on libbeat.
All beats are developed on libbeat, a library written entirely in Go that provides the API that all beats use to send data to Elasticsearch, standardizes the functionality at the code level, input options and configurations between beats.
Apart from the official beats, there is a long list of “Community Beats” that are developed by the community to meet more specific needs.
How do they work? Example problem
Let’s imagine the scenario of a simple environment composed of an Apache server proxied by a Nginx.
Due to the high traffic load that the proxy is receiving, it is necessary to control that the current number of inactive client connections does not exceed a certain threshold.
Focusing on Nginx, what interesting data can we extract from the metrics?
1.First we review in the metricbeat module documentation the metricsets supported.
- We see that Nginx currently only supports extracting metrics using the ngx_http_stub_status (stubstatus) module.
- Verify on the nginx side that stubstatus is correctly installed and accessible through its endpoint.
- Following the metricset documentation, we see an example of the data that a reading would give us:
- We observe that we have the “waiting” metric; it is the one that will allow us to control the waiting connections.
- Performing a correct configuration of metricbeat and elasticsearch to extract that metric in question, we will be able to satisfactorily measure this need with a panel similar to the following:
in addition to being able to configure the alert based on Machine Learning, totally abstracting us from having to configure some type of threshold and being able to detect high volumetry dynamically and efficiently.
The information of the Nginx logs (access.log and error.log) contain very valuable information and with a correct extraction through Filebeat we can get a scorecard like the following (and more):