Point to HN: A Prometheus exporter that gathers сomprehensive container metrics

Point to HN: A Prometheus exporter that gathers сomprehensive container metrics

The agent gathers metrics linked to a node and the containers running on it, and it exposes them in the Prometheus format.

It makes utilize of eBPF to trace container linked events similar to TCP connects, so the minimum supported Linux kernel version is 4.16.


TCP connection tracing

To provide visibility into the relationships between services, the agent traces containers TCP events, similar to connect() and hear().

Exported metrics are significant for:

  • Acquiring an accurate blueprint of inter-provider communications. It would no longer require integration of disbursed tracing frameworks into your code.
  • Detecting connections errors from one provider to one other.
  • Measuring network latency between containers, nodes and availability zones.

Log patterns extraction

Log management is mostly quite costly. In most cases, you attain now no longer deserve to investigate each tournament for my piece.
It is ample to extract recurring patterns and the different of the linked events.

This draw enormously reduces the amount of data required for enlighten log prognosis.

The agent discovers container logs and parses them staunch on the node.

For the time being the next sources are supported:

  • Tell logging to recordsdata in /var/log/
  • Journald
  • Dockerd (JSON file driver)
  • Containerd (CRI logs)

To be taught more about automatic log clustering, strive the blog put up “Mining metrics from unstructured logs”.

Prolong accounting

Prolong accounting allows engineers to precisely
title scenarios where a container is experiencing a lack of CPU time or waiting for I/O.

The agent gathers per-job counters by Netlink and aggregates them into per-container metrics:

  • container_resources_cpu_delay_seconds_total
  • container_resources_disk_delay_seconds_total

Out-of-memory events tracing

The container_oom_kills_total metric reveals that a container has been terminated by the OOM killer.

Occasion meta data

If a node is a cloud instance, the agent identifies a cloud provider and collects extra data utilizing the linked metadata services.

Supported cloud providers: AWS, GCP, Azure

Composed data:

  • AccountID
  • InstanceID
  • Occasion/machine style
  • Region
  • AvailabilityZone + AvailabilityZoneId (AWS most nice)
  • LifeCycle: on-save an notify to/issue (AWS and GCP most nice)
  • Non-public & Public IP addresses



The agent requires some privileges for getting salvage admission to to container data, similar to logs, performance counters and TCP sockets:

  • privileged mode (securityContext.privileged: like minded)
  • the host job ID namespace (hostPID: like minded)
  • /sys/fs/cgroup and /sys/kernel/debug desires to be mounted to the agent’s container


Ought to you utilize Prometheus Operator,
you are going to also deserve to invent a PodMonitor:

apiVersion: monitoring.coreos.com/v1
style: PodMonitor
  title: coroot-node-agent
  namespace: coroot
      app: coroot-node-agent
    - port: http

Ensure that the PodMonitor matches podMonitorSelector outlined for your Prometheus:

apiVersion: monitoring.coreos.com/v1
style: Prometheus
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}

The special price {} allows Prometheus to transfer making an are attempting to accumulate the complete PodMonitors from all namespaces.


docker flee --detach --title coroot-node-agent 
    --privileged --pid host 
    -v /sys/kernel/debug:/sys/kernel/debug:rw 
    -v /sys/fs/cgroup:/host/sys/fs/cgroup:ro 
    ghcr.io/coroot/coroot-node-agent --cgroupfs-root=/host/sys/fs/cgroup

Flags 80" Listen add

NOW WITH OVER +8500 USERS. other folks can Join Knowasiak for free. Register on Knowasiak.com
Read More

Charlie Layers

Charlie Layers

Fill your life with experiences so you always have a great story to tell