Table of contents

I've been running my own Mail-in-a-Box server on Hetzner cloud instance for the past 8 months and so far I had no issues whatsoever, until now.

Introduction

The server I used to setup my mail server has 2 vCPUs, 2GB of ram and 40GB of SSD disk space, running on Ubuntu 22.04, which sounds like should be plentiful for around ten email accounts with regular amount of messages.

hetzner cloud web panel
Hetzner cloud panel of my mail server

When things stopped working

I wasn't able to send or receive emails on any of my devices. I immediately logged in to the server, rebooted the services and started reading through the logs. I also had a look at CPU and RAM usage, nothing out of ordinary was to be found at first.

Things started to be stable after I restarted the services. I saw that there's a newer version of Mail in a box so I decided to upgrade.

mail in a box system status check errors
Mail in a box system status check errors

Root cause of the problem

After upgrading my server it only took a few days for it to become unstable again. I couldn't find anything in any logs related to Dovecot or any of services that Mail in a box uses. I then thought that the server might be running out of memory and that Linux OOM (Out of memory killer) may be killing my precious processes.

💡
Linux can terminate applications if it runs out of RAM and there's no swap space available. When the system exhausts its physical RAM and swap space, it may trigger the Out of Memory (OOM) killer to free up memory by terminating processes. The OOM killer selects processes based on various factors, such as their memory usage, to free up memory and prevent a system crash. It will typically target processes that are the least essential for system stability. This is a crucial mechanism to maintain system stability when memory resources are critically low.

What was causing high memory usage

SpamAssasin was using a lot of memory during its sa-learn feature, which analyses messages and classifies what counts as spam, and what doesn't. That's why I wasn't immediately able to put my finger on why the system might be running out of memory.

Running one command: grep "Killed process" /var/log/syslog

Immediately revealed that this was the cause for all the issues I was having!

The system was running out of memory, and so the Linux kernel was killing processes that weren't crucial for its survival.

Solution

I could simply upgrade the server to a more expensive one with 4GB of RAM, but since this is only a low-load personal mail server.. and I had over 20GB of free space I simply added a 4GB swap on the disk and voila! Issues solved!

fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
cp /etc/fstab /etc/fstab.bak
'/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
free -m
free -m
               total        used        free      shared  buff/cache   available
Mem:            1915         453        1018           4         443        1295
Swap:           4095         393        3702

And that's it! Adding 4GB of swap allowed the SpamAssasin to use swap during it's sa-learn process. During normal operation server is only using about ~30-40% of RAM.