Skip to main content

Bash script to log and restart docker container based on cpu usage

Link: https://dev.to/tomasggarcia/bash-script-to-log-and-restart-docker-container-cpu-usage-1j2o

I wrote this post to share with you one job experience that I had to live with recently, one problem I had and my temporal solution to this problem.
I'm new in this tech world, I will be grateful if any of you have some improvement recommendations and very pleased if this post it's useful for someone else.
A few mounths ago in my company we discovered that some Docker container was having problems with CPU usage. Out of nowhere the CPU usage of that container was increasing abruptly. So while the dev team was searching for the code error, I implemented a temporary solution. I made one script to log all cpu usage every 5 seconds:

#!/bin/bash
logs=/var/log/process_name.log
container_name=container_name

while :
do
  # Get a variable with the cpu usage for a specific container
  var=`docker stats --no-stream --format "{{.CPUPerc}}" $container_name`
  length=${#var}

  if (( $length==0 )); then
     echo "Container ${container_name} does not exist"
     echo "$(date +'%d-%m-%Y %H:%M') | Container $container_name does not exist" >> $logs
  else
    # CPU usage in number
    percent="${var[@]::-4}"

    echo "Actual cpu usage: ${percent}"

    # Save actual CPU usage in file
    echo "$(date +'%d-%m-%Y %H:%M') | ${percent}" >> $logs
  fi
  sleep 5

After that I created a supervisor config to run this process:

[program:process]
command=/opt/scripts/script.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/process.err.log
stdout_logfile=/var/log/process.err.log

Then I wrote a script to restart the problematic container, based on the logs of the previous script:

#!/bin/bash
container_name=container_name
logs_evaluated_lines=5
logs=/var/log/process_name.log
max_cpu=90

while :
do
    # Lines in file
    num=$(wc -l < $logs)

    counter=0

    # For 'logs_evaluated_lines' lines in logs increase counter if cpu is greater than 100%
    for ((index=$num;index>=$num-$logs_evaluated_lines+1;index--))
    do
    value=$(sed "${index}q;d" $logs)
    percent=$(echo $value | cut -c 20-)
    #echo $percent
    if (( $percent >= max_cpu )); then
    #    echo 'mayor'
        counter=$((counter+1))
    #  else
    #    echo 'menor'
    fi
    done

    echo "$(date +'%d-%m-%Y %H:%M') | Logs up to 100%: ${counter}"
    echo "$(date +'%d-%m-%Y %H:%M') | Logs lines analyzed: ${logs_evaluated_lines}"
    if (( $counter == $logs_evaluated_lines )); then
        echo "$(date +'%d-%m-%Y %H:%M') | CPU Full usage";
        echo "$(date +'%d-%m-%Y %H:%M') | Restarting Container"
        docker restart $container_name
        echo "$(date +'%d-%m-%Y %H:%M') | Container Restarted"
        echo "$(date +'%d-%m-%Y %H:%M') | Container Restarted" >> $logs
    else
        echo "$(date +'%d-%m-%Y %H:%M') | CPU Usage OK"
    fi
    echo "$(date +'%d-%m-%Y %H:%M') |"
    sleep 5
done

This script evaluate 'logs_evaluated_lines' lines in log and restarts the container if the count is upper 'max_cpu' variable