The anatomy of observability.
The first layer is telemetry; we cannot have observability without the raw telemetry data. Depending on the infrastructure and available resources there are many options for tools for gathering the raw telemetry. The primary focus in the first layer should always be to gain access to high quality telemetry.
The second Layer is storage; not just how we store it, but for how long too. When considering data stores for observability, finding the right balance can be a predicament. Fundamentally, if we want to handle high-throughput data efficiently (for example, accounting for 100% of all messages passed in a scaled-out app, or even taking high-fidelity infra measurements like CPU load or memory use per container), we must record statistics to a time-series database. Otherwise, we waste too much on the transfer and storage of individual events. And while some might suggest you can sample the events, for low-frequency data hidden within the high-frequency firehose, you can miss it altogether. This situation calls for a dedicated Time Series DB (TSDB): a data store designed specifically for the storage, indexing, and querying of time-series statistics like these.
And yet! If we want to handle high-cardinality data (for example per-customer tags, unique ids for ephemeral infrastructure, or URL fragments), a TSDB is an unmitigated disaster. With the explosion of tag cardinality comes an explosion of unique time series, and with it an explosion of cost. And so, there must be a Transaction DB as well; traditionally this was a logging database, although it’s wiser to build around a distributed- tracing-native Transaction DB (more on this later) that can kill two birds (logs and traces) with one stone.
Still, finding state of the art Transaction and Time Series Databases is necessary but not sufficient. To make the actual “Observability” piece seamless, the data layer needs to be integrated and cross-referenced as well, preferably a deep integration.
The challenges above can sometimes make observability difficult and at times, it may feel elusive. And this brings us to the third layer, the actual benefits; in product management realm, they would simply be called the business outcomes and they are an essential part of the value proposition canvas when selling observability & monitoring to our customers.
At the end of the day, telemetry, whether in motion or at rest, is not intrinsically valuable. It’s only the workflows and applications built on top that can be valuable. Yet in the conventional presentation of “Observability as Metrics, Logs and Traces,” we don’t even know what problem we’re solving! Much less how we’re solving it.
When it comes to modern, distributed software applications, there are two overarching problems worth solving with Observability:
- Understanding Health: Connecting the well-being of a subsystem back to the goals of the overarching application and business via thoughtful monitoring.
- Understanding Change: Accelerating planned changes while mitigating the effects of unplanned changes.
Monitoring und Beobachtbarkeit gehen also Hand in Hand, das eine ersetzt das andere nicht, sondern ermöglicht und verbessert gemeinsam definierte Geschäftsergebnisse.
As a final word Monitoring and Observability go hand in hand, one does not replace another but together enable and enhance defined business outcomes.
For more information visit Swisscom Public Cloud Services(opens in new tab). You can also reach out to us by contacting our experts here(opens in new tab) to help kick off your cloud solutions off the ground.