Ditching Fluentd in favor of rsyslog & logrotate for Kubernetes and Docker logging

So long, fluentd

While venturing into the Fluentd ecosystem in an effort to wrangle Kubernetes logging, it soon became apparent that the logging tool was lacking a certain level of maturity and robustness. We started out using v0.12 with what seemed like too many plugins (any of which don’t seem to be held to any sort of conformity) just to get logs pushed to our log aggregation tool and also archived to S3. After fighting plugin issues, tags, buffer file nightmares, fluentd hanging, upgrade incompatibilities (to v0.14) and a more headaches than I can count due to the former, it was time to ditch Fluentd – we can’t tolerate log downtime. On top of the Fluentd issues, there did not seem to be an all-encompassing way to rotate/archive/cleanup both Kubernetes and container logs.

Naturally, I fell back to what I know has worked in the past for extremely large volumes of logs – rsyslog. Bringing rsyslog into the picture meant that logrotate now had a seat at the table as well, both for log rotation and archival to S3 (using s3cmd in postrotate). In addition, grabbing all the logging facilities currently present on our nodes was dead easy with rsyslog – container logs are great for application debugging but we are flying blind if there are no system logs getting pushed out.

I should add that I’m still pretty new to Kubernetes (not to Ops) but I still do not understand why there was little (if not no) mention of doing logging this way with docker and Kubernetes. It seems common sense to me to use a time-tested logging daemon and rotation utility for critical logs. I’m really interested to hear if this is completely off-base or if I missed something with fluentd.

Anyways, lets get to it.

Container Logs

First, and probably the most invasive part of this whole ordeal, is to configure docker to use the syslog logging driver. If you are using a version of docker (pre v17.00 I believe) that does not support logging drivers, you could work around it by using the imfile rsyslog input module (although the wildcard path necessary to read container logs requires that you use rsyslog v8.25+). I configured it in /etc/docker/daemon.json but there are other ways to do it as well. Refer to the docker logging docs for more information.

After reading the docs myself, I ended up with something like this in my daemon.json – adding this meant I had to restart docker on all my kubernetes nodes. I should also note that if you have pods running, they will need to be kicked as well (either by scaling or deleting them). In the rsyslog config, local3 is going to /var/log/containers.log – one logfile to rule them all.

One thing I did notice is that configuring the syslog driver for docker means that kubectl logs will no longer work – but, if you are reading this you are probably shipping logs somewhere anyways.

/etc/docker/daemon.json
{
  "log-driver": "syslog",
  "log-opts": {
    "syslog-facility": "local3",
    "tag": "{{.Name}}"
  }
}

A way around using the syslog log driver to maintain access to kubectl logs is to do something like this (as mentioned above) with imfile. The template string is injecting kubernetes metadata into the JSON log line which are extracted on the log aggregator end of things (pod, namespace, container name, container id). Not as clean but it gets the job done.

#Aggregate  docker container logs to /var/log/containers.log
input(type="imfile"
       File="/var/log/containers/*.log"
       Tag="docker_container"
       addMetadata="on"
       Severity="debug"
       Facility="local3")

template(name="containers-json" type="string"
 string="%timestamp:::date-rfc3339% %HOSTNAME%.dev %app-name% {\"kube\":\"%$!metadata!filename:R,ERE,1,FIELD:.*containers/(.*).log--end%\",\"container_log\":\"%msg:9:$%\"}\n")

#Log all container logs to /var/log/containers.log with containers-json template to inject pod/namespace info
local3.*  -/var/log/containers.log;containers-json

#Another imfile to tail /var/log/containers.log
input(type="imfile"
       File="/var/log/containers.log"
       Tag="docker_container"
       Ruleset="containers-json-logs"
       Severity="debug"
       Facility="local3")
Rsyslog config

Since docker is configured to use the local3 facility, we can direct it to /var/log/containers.log in the rsyslog config (if you’re not using imfile to do it). Adding the .none in the last line will prevent double-duty in your /var/log/syslog. Also, given the fact that logrotate can’t accept wildcards in a log path, having all the containers log to a single file makes log rotation dead simple.

#Log docker container logs to containers.log
local3.*  /var/log/containers.log 
#Log system daemons to daemons.log
daemon.*  /var/log/daemons.log
#Log everything else to /var/log/syslog, except stuff we've already separated
*.*;local3.none,local7.none,auth,daemon.none,authpriv.none    /var/log/syslog

Kubernetes logs

Please enable kube-apiserver audit logging if you can, it’s a huge step up from pre 1.8 logging. As of right now, I’ve mainly configured logs to ship using imfile to tail the flat component log files and utilized the logging options for each component to rotate files.

Using a mix of imfile and syslog facilities with rsyslog

Kubelet and kube-proxy on the nodes seem to log to the daemon syslog facility, so that was easy to pull out with rsyslog. However, I have still not figured out the control plane components (kube-scheduler, kube-controller-manager, and kube-apiserver) so I am just using imfile to tail the logs (we use –log-path=/var/log/kubernetes). Also, we are not yet running Kubernetes 1.8 but it would be really nice to get the json log format made available in 1.8 for the apiserver when we upgrade.

I’ve also configured our apiserver to use the log rotation flags available and a script to gzip, archive to s3 and cleanup:

  --audit-log-maxage int      The maximum number of days to retain old audit log files based on the timestamp encoded in their filename.
  --audit-log-maxbackup int   The maximum number of old audit log files to retain.
  --audit-log-maxsize int     The maximum size in megabytes of the audit log file before it gets rotated.
  --audit-log-path string     If set, all requests coming to the apiserver will be logged to this file.  '-' means standard out.
{% if 'masters' in group_names %}
#Kube master Logs
input(type="imfile"
      File="/var/log/kubernetes/kube-apiserver.INFO"
      Tag="kube-system.apiserver-audit"
      Ruleset="standard-syslog"
      Severity="info"
      Facility="local7")

input(type="imfile"
      File="/var/log/kubernetes/kube-apiserver.log"
      Tag="kube-system.apiserver"
      Ruleset="standard-syslog"
      Severity="info"
      Facility="local7")

input(type="imfile"
      File="/var/log/kubernetes/kube-controller-manager.INFO"
      Tag="kube-system.controller-manager"
      Ruleset="standard-syslog"
      Severity="info"
      Facility="local7")

input(type="imfile"
      File="/var/log/kubernetes/kube-scheduler.INFO"
      Tag="kube-system.scheduler"
      Ruleset="standard-syslog"
      Severity="info"
      Facility="local7")
{% endif %}

Bringing it together

This ansible play will do the required to get logrotate and s3 configured, given you provide it with the templates. The s3cfg is basic, nothing special about it – just make sure when you use s3cmd in your logrotate config you reference the exact path, because with no config specified it looks in /root/.s3cfg.

Ansible some logrotate
---

- name: Install s3cmd
  apt:
    name: s3cmd
    state: present
  tags:
    - logrotate

- name: Configure s3cmd
  template:
    src: logrotate.s3cfg.j2
    dest: /root/logrotate.s3cfg
    mode: 0600
  tags:
    - s3cmd
    - logrotate

- name: Drop in rsyslog logrotate file
  template:
    src: logrotate.syslog.j2
    dest: /etc/logrotate.d/rsyslog
  tags:
    - logrotate

- name: Drop in containers logrotate file
  template:
    src: logrotate.containers.j2
    dest: /etc/logrotate.d/containers
  when: "'nodes' in group_names"
  tags:
    - logrotate
logrotate.s3cfg.j2
[default]
encoding = UTF-8
access_key = {{ s3_logging_access_key }}
secret_key = {{ s3_logging_secret_key }}
use_https = True
force = False
logrotate.containers.j2
/var/log/containers.log
{
  rotate 7
  daily
  missingok
  notifempty
  compress
  sharedscripts
  dateext
  dateformat -%Y-%m-%d
  postrotate
    /etc/init.d/rsyslog force-reload>/dev/null 2>&1 || true
    /usr/local/bin/s3cmd sync --config=/root/logrotate.s3cfg /var/log/containers*.gz "s3://{{ env }}-containers-logs/{{ inventory_hostname_short }}.{{ env }}/containers/ 2>&1 /var/log/s3cmd_out.log"
  endscript
}

Conclusion

After coming to a relatively painless and more complete solution using Linux native utilities, it became apparent that fluentd is a weak logging option for a Kubernetes cluster, despite the popularity. If it was more stable, I might have a different opinion – but until then, rsyslog it is.

Leave a Reply

Your email address will not be published. Required fields are marked *