MariaDB Server Unresponsive

imok · September 22, 2020, 3:30pm

I have a server that suddenly gets frozen until reboot. It’s a VM running on Proxmox.

Changed from CentOS 7 to 8.
Changed from MariaDB 10.5 to 10.4
Reinstalled Proxmox.

And keeps happening.

This is a Proxmox host running in a Ryzen VPS.

Any ideas where to look?

Miguel · September 22, 2020, 3:36pm

Waaa, downgrading from MariaDB 10.5 to MariaDB 10.4? They introduced a new auth system. So you’re a hero, I wouldn’t do that myself.

Check the logs?

imok · September 22, 2020, 3:49pm

Any idea where to find them?

MariaDB [(none)]> show global variables like 'log_error';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_error     |       |
+---------------+-------+

This is the content of /var/log/

Neoon · September 22, 2020, 4:02pm

Well, maybe its the host?
Or you have queries, that lock your database server up.

Miguel · September 22, 2020, 4:27pm

You haven’t specified an error log location in my.cnf, you need to.

You can also check live queries using mtop or show processlist;

Falzo · September 22, 2020, 4:37pm

more information needed…

what do you mean by frozen, the service, the whole VM
is it still accessible via ssh or vnc in that state or not
can you match the timestamp to any syslog entries and is there anything after the freeze
what else is running on it other then mariadb
which hardware specs/options did you use, esp. for disk (thin-lvm, raid, virtio, scsi etc.?), network (virtio, local vs public ip etc.), memory (swap available?)

imok · September 22, 2020, 4:52pm

Thanks will do.

Do you mean mytop? Installing… but the problem happens mostly when I’m not at the computer.

The whole VM.

No SSH connection. Ping stops. VNC shows the login screen and nothing can be done there.

Can’t find /var/log/messages in CentOS 8, I’m going to read where it should be in this version.

New Relic monitoring agent, qemu agent. But it happened when I was using CentOS 7 with MariaDB 10.5 without those agents.

Host (it’s a VM too):

Guest:

Plenty of RAM and swap usage at zero on both. No high load detected.

I will update when I set up the logs correctly.

nem · September 22, 2020, 4:56pm

Install atop with a 30 second granularity (/etc/sysconfig/atop) to see what’s going on. Sounds like you’ve got a spike in load that you’re misattributing to MariaDB.

Next time it goes down flip through the log, atop -r /var/log/atop/atop_2020YYMMDD to see what’s happening prior to the lockup.

imok · September 22, 2020, 5:22pm

Should I run systemctl enable/start atop after installing?

nem · September 22, 2020, 5:36pm

Yup after dropping LOGINTERVAL from 600 to 30. systemctl enable --now atop does both.

Falzo · September 22, 2020, 5:48pm

so you are using nested virt? (I think I now remember something in the other thread about the IPs…)
do you have other VMs running in parallel in your proxmox?

my bet would be on something like hitting IO limits or whatever. system not able to read or write properly anymore… mariadb does not has to be the direct cause as @nem already wrote. could simply add to the problem.

do you know what the underlying storage system is on the real hostnode? how did you setup your storage? zfs? thin-lvm? or just plain ext storage?

imok · September 22, 2020, 5:57pm

Nooice.

Did I use “nooice” correctly?

Yes to all, I have a web server in another guest.

Not sure. Maybe @seriesn can comment something.

I installed Debian and Proxmox on top, a big partition.

No idea about lvm stuff, that’s something pending to learn.

SERIESN · September 22, 2020, 6:47pm

Looks LVM. Anything happens when you boot via rescue mode?

imok · September 22, 2020, 7:03pm

It’s already a production server, I can’t restart. I will need to migrate the database.

I will try to configure the logs properly in a few hours, maybe they catch something useful if it happens again. Fortunately I have another little server to use temporary.

Miguel · September 23, 2020, 8:51am

No, mtop, to monitor the MySQL queries.

imok · September 23, 2020, 11:05pm

I reinstalled again last night. Running CentOS 7 and MariaDB 10.5, let’s see how it goes.

All logs recommended here are configured and it’s running Nixstats agent too.

Let’s see how it goes.

Thank you everyone!.

Daniel · September 25, 2020, 11:55pm

Make sure you have a syslog daemon like rsyslog installed and running.

imok · September 30, 2020, 4:41pm

It happened again

This is what I got from the console:

Output of /var/log/messages: https://pastebin.com/aewmSSjv (server went down at 11:09 AM)

nem · September 30, 2020, 5:10pm

First screenshot is a kernel panic. Is your microcode up to date? Firmware? Anything tasty in /var/log/boot.log? I’ve seen bad memory for example result in sporadic panics under load. As an example, this line was enough to deduce the memory was bad.

[    0.000000]  gran_size: 64K     chunk_size: 16M     num_reg: 10      lose cover RAM: 238M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 32M     num_reg: 10      lose cover RAM: -18M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 64M     num_reg: 10      lose cover RAM: -18M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 128M     num_reg: 10      lose cover RAM: -16M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 256M     num_reg: 10      lose cover RAM: -16M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 512M     num_reg: 10      lose cover RAM: -16M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 1G     num_reg: 10      lose cover RAM: -512M
[    0.000000] *BAD*gran_size: 64K     chunk_size: 2G     num_reg: 10      lose cover RAM: -1536M

Client had memory replaced and his server has been humming ever since.

imok · September 30, 2020, 5:51pm

Not sure about that. At least there are not package updates available in both host and guest.

Everything says OK: https://pastebin.com/QiuBUW4y

Is it possible there are memory problems? This failing server is a VM, another VM on the same host is running completely fine. Host is a VM too.