Usage examples
Tutorial use-case: colorizing the nodes
Let’s take a look at the following very simple graph:
[frontend] --> [backend]
Both services expose a metric http_request_latency_seconds:mean5m
with labels app=frontend
and app=backend
.
And let’s consider the following Alert rule as an real life™ example.
groups:
- name: Backend
rules:
- alert: HighRequestLatency
expr: http_request_latency_seconds:mean5m{app="backend"} > 0.5
for: 10m
labels:
severity: critical # <--- we only want to display alerts with severity=critical
service_id: backend # <--- this is a node in our graph
annotations:
summary: High request latency
The following mapping does this:
- only take alerts with
severity=critical
into consideration and - the value of label
service_id
points to a node in our graph
endpoints: {} # ...
metrics: {} # ...
mapping:
alerts:
label_selector:
- severity: "critical"
service_labels:
- "service_id"
When this alert is firing, the backend
will be red. pretty straight forward.
When the alert is NOT firing the backend
will be green.
The frontend though will not be colorized in any way because there is no mapping for it.
How can we get this service green then? No, you don’t have to define an alert for each service explicitly (tho you can do it of course!).
What you need is a common label
that has all the available services as values. If you configured prometheus properly™ you have those labels already. In this tutorial we have http_request_latency_seconds:mean5m
with labels app=frontend
and app=backend
(see above).
Use the mapping.metrics.service_labels[]
config to tell statusgraph to lookup all values for label app
.
endpoints: {} # ...
metrics: {} # ...
mapping:
metrics:
service_labels:
- app
If you don’t have these labels yet, configure metric re-labeling (see here and here). As a last resort you can consider using label_replace with recorded rules.
Matching multiple nodes in a graph
Complex systems fail in complex ways. For example, a http request may fail during DNS resolution, when doing a TCP handshake or if the wrong HTTP status code is sent from the server. Thus, a simple alert may affect multiple nodes in the graph depending on your level abstraction.
This use-case is supported using csv in label values (yes, it’s hacky. but that’s how the prom spec is):
groups:
- name: Backend
rules:
- alert: StupidHooman
expr: all_cables_unplugged > 0
for: 5m
labels:
severity: critical
service_id: frontend,backend # use this to colorize 2 graph nodes at the same time
annotations:
summary: High request latency
Generic Alerts
You can define generic alerts which re-use labels of a metric that
groups:
- name: Backend
rules:
- alert: HighRequestLatency
expr: http_request_latency_seconds:mean5m{app="backend"} > 0.5
for: 10m
labels:
severity: critical # <--- we only want to display alerts with severity=critical
service_id: "{{ $labels.service_id }}" # <--- this is a node in our graph
annotations:
summary: High request latency