In this article we are going to talk about end-to-end monitoring, a wonderful concept that supported by all the fundamentals of observability will give the primary and secondary activities of an organization a source of data with an incredible added value. With IOMETRICS® from DataDope the implementation of this type of monitoring adapts to most needs of any business process, as long as there are exploitable data sources in the organization (with a good definition and study they always exist). End to end vision. As the title describes, the first thing is to start with the meaning of the words; “End to end” means “holistic vision“, so if we talk about end-to-end monitoring we refer to monitoring or controlling that all the processes and transactions that occur within a workflow from beginning to end between all its components and in any of its phases are being fulfilled without exceptions to guarantee maximum availability, integrity and quality of its operation. If we apply it to a business process, it allows us to facilitate the detection of values, strengths or weaknesses in its performance and give the monitored organization a source of data that analyzed and used correctly will have a very positive impact on the performance of the activity.
In observability
Speaking in general terms of observability, a notion increasingly accepted and approved by large digitally mature companies, it tells us what types of data should be observed in an organization that follows a data-driven philosophy to achieve maximum visibility of everything that happens between the computer systems in which an incredible amount of data can be extracted that extracted and stored in monitoring platforms and good definitions of ETL (Extract, Transform and Load) processes give us information of great relevance about the state of the business.
It is necessary to choose the appropriate tools to save and structure the data so that a posteriori to be able to search and represent them in a logical and orderly way.
In this context, observability is composed of 3 fundamental pillars:
- Metrics: These are data (usually numerical) that allow us to measure performance about how an asset was at a certain time.
e.g. application X was consuming 20% CPU at 23:59 on 01/10/2021.
- Logs and events: These are events that happen within a program and that are recorded in plain text files or in later readable buffers, they generally follow a standard structure to facilitate the interpretation of their information.
e.g. 0.185.248.71 – [01/Oct/2021:20:12:07 +0000] 808890 «GET /inventoryService/inventory/purchaseItem?userId=20253471&itemId=23434300 HTTP/1.1» 500 17 «-» «Apache-HttpClient/4.2.6 (java 1.5) » In general, with metrics and events we get a good foundation to understand the performance and behavior of an individual system, although we do not have the ability to see the lifecycle of a request or a request that moves within one or different components.
- Tracking: Tracking allows us to know the journey of each trace (an action or a transaction) through all the nodes of a system. In its path, it has to allow us to collect durations, states, actions that are carried out at high and low level, etc. This is very useful for detecting performance issues, looking for the root cause of a problem, detecting a bottleneck, making sure of the integrity of all operations, etc. Currently there are models to be able to structure this data in a logical and orderly way such as opentelemetry, opentracing or Elasticsearch APM.
When we talk about end-to-end monitoring we are located in this last section, although by correlating and correctly combining the data between all types we can achieve very satisfactory results.
What can we achieve with end-to-end?
Taking this example image, we can imagine a scenario of a business process that makes online sales which is composed of different components that, after analysis of each system, finds the way to retrieve the information of each system. The trace begins when the user makes an online sale and ends when the product arrives at home. In addition, you can track and correlate the trace in each component since the sales ID is shared among all. Looking at the diagram, we see that the business process is sectioned by 4 components. For each of them and for each transaction we can get the state (OK or KO), the duration (the time the transaction has taken within the component itself), the volumetry (how many transactions it has processed), and the mode of the component (if it is an initial component, an intermediary component or a final component)
- The first component simulates a web server with an eCommerce to make sales. It is the entry point, so each purchase will start a trace and will not end until it reaches the last component.
- The second component is intermediary, so its function at the end-to-end level is to add all its metrics to the existing trace. It is a database, so it is controlled that the transaction has been written correctly among others.
- The third system is also an intermediary so it continues to add up. It simulates being a warehouse that can be consulted by API, so it can be integrated without problems to this monitoring.
- And ending with the fourth is the one that ends the trace, so it performs calculations between the traces of the first and its own to define the most exact values that the newly completed trace will carry.
What data can we achieve by carrying out this monitoring?
- How many sales have been made in total?
- How many sales are ongoing vs. how many have ended?
- How many sales have been made successfully vs failed between any of the systems?
- How many sales have received any arrangement so that you could continue with your flow?
- Total duration from when the user places the order until it arrives at home
- How long it has lasted on each component and why
- Time latency between components
- Error recovery: cause, responsible, description, etc.
But also:
- Visualization in a timeline of all the events
- An exploitable and properly structured data source
- Empowerment and forecasting of sales, errors, with Machine Learning