5
\$\begingroup\$

I have written a short bash (or POSIX-compliant) script to clean up the Traefik access log file and limit it to a maximum of 50 MB. I would like to know what you think of it. Any suggestions are welcome.

The script uses sed -i ... to drop older and not longer needed lines from the file. The log file format is in plain text format and not in JSON format, so the log file should still be valid after cleanup. Docker is then notified by the USR1 signal so that Traefik can "reload" its logs.

#!/bin/sh
set -e
fn="/var/log/traefik/access.log"
limit=$((50*1000*1000))
size=$(stat -c%s "$fn")
if [ $((limit >= 0 && limit <= size)) = 0 ]
then
  echo "Nothing to do."
  return 1
fi
lines=$(wc -l "$fn" | awk '{ print $1 }')
lines_to_del=$((((size-limit)*lines)/size))
if [ $((lines_to_del > 0 && lines_to_del <= lines)) = 0 ]
then
  echo "Nothing to do."
  return 1
fi
printf "%s %s %s %s %s\n" "$fn" "$limit" "$size" "$lines" "$lines_to_del"
sed -i "1,${lines_to_del}d" "$fn"
docker kill --signal="USR1" traefik
return 0
\$\endgroup\$

3 Answers 3

7
\$\begingroup\$

Overall, looks good to me. Just some minor nitpicks:

size=$(stat -c%s "$fn")

That's a case where set -e won't error if stat fails - we need to explicitly test:

size=$(stat -c%s "$fn") || exit

  echo "Nothing to do."
  return 1

I would consider "no action needed" to be a success condition, so return zero here. And as we're not in a function, perhaps exit 0 instead for clarity.


if [ $((limit >= 0 && limit <= size)) = 0 ]

I'd probably use two tests rather than arithmetic expansion there:

if [ "$limit" -lt 0 ] || [ "$limit" -gt "$size" ]

Similarly for the lines_to_del test.


lines=$(wc -l "$fn" | awk '{ print $1 }')

Instead of parsing output with AWK, we could have wc read from standard input so that it doesn't print a filename.

lines=$(wc -l <"$fn")

printf "%s %s %s %s %s\n" "$fn" "$limit" "$size" "$lines" "$lines_to_del"

I'm not convinced that this row of numbers will mean anything to users without more explanation.


sed -i is not (POSIX) standard. Consider using ed instead.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ It might be worth pointing out that stat -c isn't portable. AFAIK, that's only for GNU stat, but in any case, it isn't portable and portable scripts should not assume that the local stat implementation supports the non-standard -c flag. For example: ss64.com/mac/stat.html \$\endgroup\$ Commented yesterday
9
\$\begingroup\$

Log rotate

At first it was unclear to me why you chose this approach rather than take advantage of the various log rotate options that are available in Traefik (docs).

But it turns out that Traefik does not help with access logs; and you have to manage them yourself.

Usually, I prefer to rely on Linux logrotate for this kind of job. The fact that we are running in Docker does not preclude us from using that kind of mechanism.

Truncating logs

First of all, does the script run as expected? I get this error:

return: can only 'return' from a function or sourced script"

Which makes sense. return should be used to stop the execution of a Bash function. Is the script meant to be called directly?

Probably you mean exit 0, but that is the default exit code anyway. Only when you want to return a non-zero value, should you use exit.

The log file format is in plain text format and not in JSON format, so the log file should still be valid after cleanup

Let's hope so. But I would not be so certain. Some applications may log multi-line text separated by \n, optionally enclosed within quotes. For example the exception stack trace.

Counting lines

Instead of:

lines=$(wc -l "$fn" | awk '{ print $1 }')

You can do it without awk or cut - look at the subtle difference:

lines=$(wc -l < "$fn")

But I find the approach wasteful. Counting lines requires some disk reads and CPU cycles, and this could take seconds on a large file. Instead, we should just rotate the log file once it has reached a certain size in bytes.

I am not fond of truncating the file in-place. Just rotate the whole of it, and start with a new file. Zip the last few log files for reference.

Keeping a backup

A small tip anyway, useful for testing at least:

sed -i.bak "1,${lines_to_del}d" "$fn"

This will keep a copy of the original file, appending a .bak extension so you can compare changes, and recover from errors.

Error handling

At this time, there is no real error handling, other than the set -e command which tells the shell to exit immediately if a command exits with a non-zero status.

I recommend using the trap function in Bash to handle errors or run cleanup routines. This script does one thing that is dangerous: altering a file. On top of that, the file is being used by a critical piece of software.

So it's useful to retain a backup, so that we can recover from any error occurring during this sensitive operation.

Your script should also check that it is touching a file, and not a directory for example. What would happen then?

\$\endgroup\$
2
  • \$\begingroup\$ "return": true, I meant exit 0, was a typo. "count lines": It is essential to count the lines, as the remaining lines are still needed and should remain valid. "error handling": Since no errors can occur in the calculations and all arguments/parameters are checked, an additional trap is not necessary. \$\endgroup\$ Commented Nov 26 at 17:37
  • 1
    \$\begingroup\$ "exit 0 ... is the default exit code anyway" - only because of set -e. In general, a script's exit code is the same as that of the last executed command (not counting any EXIT trap commands). \$\endgroup\$ Commented 2 days ago
7
\$\begingroup\$

log rolling races

I can't imagine why you chose to re-invent the wheel here (and the original posting is not so tagged). Log rolling is a mature and well studied need, with good solutions such as the chapter-8 logrotate. With such a solution we would still have lots of historic data to review if we need to compute stats or sleuth the details of some buggy behavior. Long term stats are easier to compute if files hold non-overlapping time ranges of log records.

The fundamental requirement is that a data source, a daemon, shall not consume storage without bound. We model it as producing bytes of log at some rate, and we wish to impose an approximate limit on consumption, subject to inevitable burstiness. This motivates doing log rolling "often". We either keep "lots" of old logs, say 23 stale hourly logs for daily retention, or we often do a cheap stat() like the OP and notice that st_size is still small enough that we silently do nothing and exit with success status. Retaining lots of little files yields much more constant disk usage than keeping, say, just two or three.

We want to "often" do an efficient log roll, so frequent runs will mitigate the size excursions that occasional log activity bursts can produce. The OP's wc -l < ${fn} operation has CPU and I/O cost proportional to file size, and tends to evict other files from the FS cache, reducing the cache's effectiveness. I'm glad to see that we just stat() the file in the "nothing to do" case.

normal behavior leads to error

Doing a return 1 (or exit 1) in that case is very odd, since we successfully determined that there's nothing to do. Also, I don't know how you run this, but the noisy echo for that case is very unusual, since sometimes we may wish to run a log roller from cron, and crond by default will email stdout + stderr to the account we ran under.

race

The rolling operations are

  1. sed does an atomic rename() from .log to .log.bak
  2. sed reads input, while daemon can still do buffered logging to the .bak file
  3. sed re-writes a shorter .log file
  4. kill sends a USR1 signal
  5. daemon receives the signal, flushes / closes the .bak, re-opens the .log

Any time between sed completing its read() and the daemon completing its close() is the race interval. The daemon can log one or more important messages then, and they will appear only in the (ignored?) .bak file. Roll the file once more and those messages will be gone forever, even if ${lines_to_del} ought to have retained them.

Utilities like logrotate avoid such a race by acting at file granularity, and by patiently waiting to see the daemon complete step (5.) before doing further processing, such as trimming or compressing an old log file.

If you find it convenient to have a "last few hours" file which always has at least a few hundred lines, consider re-writing it as final step of a roller script, with cat ${fn}.1 ${fn}.2 ${fn}.3 > ${fn}.recent. Or use the corresponding date-based filenames.

conversion factor

lines_to_del=$((((size-limit)*lines)/size))
lines_to_del=$((( size-limit)*(lines/size)))

There is of course nothing wrong with the order-of-operations in the OP code. But as a maintainer, I would rather see the 2nd expression, where we're multiplying by unity. It's just easier to reason about when presented as a conversion factor.

Reading the file twice, with wc and sed, is no big deal. But consider caching the \$\frac{lines}{byte}\$ conversion factor in a one-line file, rather than recomputing a figure which largely remains constant.

\$\endgroup\$
6
  • \$\begingroup\$ As already mentioned, I decided to replicate part of logrotate's functionality for two reasons: a) Docker/Traefik does not offer this functionality for "external" log files, and b) logrotate itself does not split the files "in the middle", but part of the log file is still needed later by another application that analyzes the log file. In my opinion, this is not directly a reinventing-the-wheel. \$\endgroup\$ Commented 2 days ago
  • 3
    \$\begingroup\$ Given the racy nature of the log rolling dance, I am skeptical that "splitting in the middle" is a good idea. If "another application" (not mentioned in the original Review Context) needs "a last-few-hours file which always has at least a few hundred lines" to run its analysis, then you might consider using the cat post-processing approach I proposed. The cron job entry might run logrotate && cat x y z > recent.log. I come from an RDBMS background, where the mantra is "don't lose a committed record; don't corrupt state". Specs and races don't really get along with each other. \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ I use GoAccess and cannot find any information in the man pages that multiple log files also work with the real time HTML output... So this was a trade-off between too many lines and no lines at all (or an empty log file). \$\endgroup\$ Commented 2 days ago
  • 1
    \$\begingroup\$ @Tobias Grothe GoAccess normally accepts multiple log files from the command line, and you could still pipe them - See github.com/allinurl/goaccess/issues/2776. So, for real time parsing, doing a tail could work perhaps? But assuming the log files are going to be renamed after rotation, throwing inotify in a monitoring loop could be an option. \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ @J_H Btw, "conversion factor": I decided to write down the calculation in this order, i.e., to optimize it, because otherwise rounding errors can occur in integer division. It may not be more readable, but it is more accurate. \$\endgroup\$ Commented 2 days ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.