# MCA: CPU 0 COR GCACHE LG RD error



## kalleboy (Jan 30, 2022)

Hi folks. 

I got the following lines when checking my server with "dmesg -a", any idea what it refers? _MCA: CPU 0 COR GCACHE LG RD error_

OS: FreeBSD 13-RELEASE


```
Starting mysql.
Starting background file system checks in 60 seconds.
...
Sat Jan 29 14:59:15 +03 2022
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
```

Thanks in advance.


----------



## VladiBG (Jan 30, 2022)

Machine Check Architecture - Wikipedia
					






					en.wikipedia.org
				




it can be caused from bad memory, bad power supply, overheat, overclock ,bad cpu or bios
start by testing the memory and check for new bios.

also clear the empty memory slots from dust


----------



## SirDice (Jan 30, 2022)

Note the bank and address of this one. Then move your memory modules around, shift everything around one slot for example. This will re-seat everything (in case it's a bad connection). If you get a similar error again but it now has a different bank and address then you know one of the memory modules is broken (the error moved with the module). If the bank and address stays the same the issue is with the mainboard or CPU.


----------



## VladiBG (Jan 30, 2022)

The MCA banks are fixed by processor blocks they are not to be mistaken for memory banks/modules.
More info can be found here (page 164/ 3.4 Machine check banks): https://developer.amd.com/wp-content/resources/56255_3_03.PDF


----------



## kalleboy (Jan 30, 2022)

Thanks for your replies. Well, so we aren't clear yet here.. I really hope it's not a hardware fault.. That'd lead a brand new (and bad) adventure here *locally*.

Here's is my dmidecode output: https://bsd.to/5VpC/raw

And mcelog output:

```
root@mybox:~ # mcelog --no-dmi --ascii --file /var/log/dmesg.today
mcelog: Unknown CPU type vendor 2 family 23 model 1
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
CPU 0 BANK 18
MISC d01b0fff01000000 ADDR 40000051068f800
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
```


Would be much grateful on having any idea/suggestion.


----------



## VladiBG (Jan 30, 2022)

ASRock Rack > B450D4U-V1L
					






					www.asrockrack.com
				




Is this your Motherboard?


----------



## kalleboy (Jan 30, 2022)

Well, it seems so;

"Base Board Information
    Manufacturer: ASRockRack
    Product Name: B450D4U-V1L"

P.S.: This is my dedicated server rented at Hetzner.


----------



## VladiBG (Jan 30, 2022)

Then contact Hetzner to check where is the problem. It may be wrong bios version or some bad memory.


----------



## kalleboy (Feb 7, 2022)

Hetzner responded and thanks to them, they offered several solutions really quickly and professionally;

"We can offer you the following options for the server:
1. Exchange the server, but keep the drives:
To rule out a majority of the sources of hardware error, it is possible for us to exchange your server but keep all of your drives. The server would need to be shut down for approximately 20-30 minutes."

And the second option was to exchange the server and exchange the drives, and the third one was to run a complete hardware check (10 hours of diagnostics duration).

I'm going to request the first option; exchanging the server and MOVING the drives.

Now, I wanted to know, would there be any OS-level (FreeBSD kernel, boot, ZFS structure) trouble on moving the current 2xNVMe disks (ZFS - stripe) into the exact model but new server?


----------



## VladiBG (Feb 7, 2022)

You need the same BIOS settings on the new motherboard like SATA mode (AHCI) UEFI/Legacy (CSM), NVME Raid on/off, Secure boot off and so on. My guess is that you are using UEFI in order to boot from NVME.
It depend how is set up your current boot. Check your current boot method using `sysctl machdep.bootmethod` if it's set to UEFI then verify how the EFI variable is set in the bios using `efibootmgr -v` this will show you if you are booting directly from efi file or searching the first disk for ESP and booting  the default bootx64.efi. You may need to create a new boot entry on the new motherboard using the EFI shell or using graphical interface if the UEFI bios has one.
Anyway write down the output of `efibootmgr -v` which will show you current UEFI entries recorded in the bios.

Edit:
Did you had other MCA errors from the last one? Or it was single event.


----------



## Andriy (Feb 7, 2022)

kalleboy , install sysutils/mcelog, it can decode the message to be (slightly) more readable.


----------



## Andriy (Feb 7, 2022)

> GCACHE LG RD error


That can be either a CPU problem somewhere in the L3 cache or an ECC memory error.
Depending on how the CPU is configured (there are too many options) DRAM ECC errors may not get detected until corrupt data is accessed in the L3 cache.


----------



## kalleboy (Feb 8, 2022)

VladiBG The bootmethod was: BIOS, from the output of sysctl machdep.bootmethod. I decided to setup a new server&new disks (had my backups already) anyway.
And yes, the "dmesg -a" started to fill errors like below;


```
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01a0ffc01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01a0ffe01000000
```

Andriy good point as well.

Thank you guys.


----------

