Generating Prometheus Metrics from Application Logs

December 16, 2020 - 4 min read
kubernetes prometheus mtail

While many applications and services already expose internal metrics in Prometheus format - especially the ones often deployed on Kubernetes - many others keep this information only inside log files and require extra tooling to parse and expose the data.

Recently i worked on a distributed setup with Kubernetes clusters running in multiple clouds, each with a Prometheus instance collecting local metrics which then get aggregated centrally. One of the use cases involves evaluating the performance of batch like workloads, which also tend to keep performance data in the logs for end user evaluation after the jobs are finished.

Looking for a good option to expose these metrics live in the Prometheus setup above (including the central aggregation) i found mtail which made my day. It's a simple tool that does the job - i rely on other solutions for other projects but they tend to be generic enough to require multiple pieces and complex configurations.

An mtail Example

mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding

It supports collectd, StatsD and Graphite as well, but in this case i care about Prometheus. You can define counters and gauges and attach labels that end up being multiple dimensions on the metrics.

Here's an example of the output from one of the workloads being submitted.

$ cat sample.log
running with 1 200 100 2000
platform: gpu
1.8687622547149658
(200, 100, 1, 2000) gpu:0
running with 1 200 100 2000
platform: gpu
1.9642915725708008
(200, 100, 1, 2000) gpu:0
...

We can create a gauge with the workload duration (the line with the floating point value) with a very simple regex.

$ cat config.mtail
gauge duration

/(?P<dur>[0-9]+[.][0-9]+)/ {
    duration = float($dur)
}

And we validate locally before deploying.

$ mtail -one_shot -logs sample.log -progs config.mtail
Metrics store:{
  "duration": [
    {
      "Name": "duration",
      "Program": "log.mtail",
      "Kind": 2,
      "Type": 1,
      "LabelValues": [
        {
          "Value": {
            "Value": 1.9642915725708008,
            "Time": 1608122868426470655
          }
        }
      ]
    }
  ]

In this mode mtail is dumping the last metric parsed, but by default it would start a HTTP listener on port 3903 that can be scraped periodically.

Deployment as a Sidecar

The next step is to deploy this along the workload in our Kubernetes clusters, with the goal of not having to change the original application. To generate the metrics mtail needs access to the log file being populated and this is where the sidecar pattern fits perfectly - not sure who first came up with the Pod concept, but kudos.

We need a ConfigMap to hold the mtail config.

apiVersion: v1
kind: ConfigMap
metadata:
  name: myworkload
data:
  log.mtail: |-
    gauge duration
    
    /(?P<dur>[0-9]+[.][0-9]+)/ {
        duration = float($dur)
    }

And in this case we're deploying the workload using a CronJob.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: myworkload
spec:
  schedule: ...
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            workload: myworkload
            cluster: clustername
            cloud: gcp
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "3903"
        spec:
          containers:
          - name: myworkload
            image: rochaporto/myworkload
            command: ["bash", "-c", "python3 /lowlevel.py 2>&1 | tee -a /log/output.log"]
            volumeMounts:
            - name: log
              mountPath: /log
          - name: mtail
            image: rochaporto/mtail
            command: ["bash", "-c", "while [[ ! -f /log/output.log ]]; do sleep 1; done && /bin/mtail -logtostderr -progs /etc/mtail/log.mtail -logs /log/output.log"]
            ports:
            - containerPort: 3903
            volumeMounts:
            - name: log
              mountPath: /log
            - name: mtail
              mountPath: /etc/mtail
          volumes:
          - name: log
            emptyDir: {}
          - name: mtail
            configMap:
              name: myworkload

There are some things that are interesting in this definition.

The log volume is mounted on both containers so that mtail can see the log file being populated, and the funny while loop before launching mtail waits that it gets created. Also tee is used to also keep the logs in the main container's stdout - so that kubectl get logs myworkload -c workload is also possible
The mtail container exposes port 3903 which is the default port for mtail's HTTP listener. Our Prometheus instance has rules to take as scrape targets all Pods with prometheus.io/scrape=true, you might be doing discovery differently