# Memory failure?



## IPTRACE (Jan 5, 2017)

Hello!

What do the following erros mean?


```
Jan  3 19:13:15 hpv kernel: MCA: Bank 13, Status 0x8c000051000800c0
Jan  3 19:13:15 hpv kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Jan  3 19:13:15 hpv kernel: MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 48
Jan  3 19:13:15 hpv kernel: MCA: CPU 30 COR (1) MS channel 0 memory error
Jan  3 19:13:15 hpv kernel: MCA: Address 0x35c9cbf780
Jan  3 19:13:15 hpv kernel: MCA: Misc 0x918c2000200228c
.......
Jan  5 16:06:59 hpv kernel: MCA: Bank 8, Status 0x8c00004000010090
Jan  5 16:06:59 hpv kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Jan  5 16:06:59 hpv kernel: MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 48
Jan  5 16:06:59 hpv kernel: MCA: CPU 30 COR (1) RD channel 0 memory error
Jan  5 16:06:59 hpv kernel: MCA: Address 0x35c9cbf740
Jan  5 16:06:59 hpv kernel: MCA: Misc 0x152606086
Jan  5 16:06:59 hpv kernel: MCA: Bank 8, Status 0x8c00004000010090
Jan  5 16:06:59 hpv kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Jan  5 16:06:59 hpv kernel: MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 49
Jan  5 16:06:59 hpv kernel: MCA: CPU 31 COR (1) RD channel 0 memory error
Jan  5 16:06:59 hpv kernel: MCA: Address 0x35c9cbf740
Jan  5 16:06:59 hpv kernel: MCA: Misc 0x152606086
```


----------



## IPTRACE (Jan 5, 2017)

I've found a tool dmidecode and linked the MCA: Addresses with properly address range on memory.
Is it correct?


```
Jan  5 16:06:59 hpv kernel: MCA: Bank 8, Status 0x8c00004000010090
Jan  5 16:06:59 hpv kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Jan  5 16:06:59 hpv kernel: MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 48
Jan  5 16:06:59 hpv kernel: MCA: CPU 30 COR (1) RD channel 0 memory error
Jan  5 16:06:59 hpv kernel: MCA: Address 0x35c9cbf740
```


```
Handle 0x004B, DMI type 19, 31 bytes
Memory Array Mapped Address
        Starting Address: 0x02FFFA00000
        Ending Address: 0x03FFFFFFFFF
        Range Size: 65542 MB
        Physical Array Handle: 0x004A
        Partition Width: 2

Handle 0x004C, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x004A
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM_P1_G0
        Bank Locator: P1_Node1_Channel2_Dimm0
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MHz
        Manufacturer: SK Hynix
        Serial Number: 80F1D447
        Asset Tag: DIMM_P1_G0_AssetTag
        Part Number: HMA84GL7MMR4N-TF
        Rank: 4
        Configured Clock Speed: 2133 MHz
```

The same memory bank as above?

```
Jan  3 19:13:15 hpv kernel: MCA: Bank 13, Status 0x8c000051000800c0
Jan  3 19:13:15 hpv kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Jan  3 19:13:15 hpv kernel: MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 48
Jan  3 19:13:15 hpv kernel: MCA: CPU 30 COR (1) MS channel 0 memory error
Jan  3 19:13:15 hpv kernel: MCA: Address 0x35c9cbf780
```


----------



## ASX (Jan 5, 2017)

IPTRACE said:


> What do the following erros mean?



There an utility to decode those messages: sysutils/mcelog

`mcelog --ascii [ paste your log to STDIN ]`

and got:

```
CPU 30 BANK 13
MISC 918c2000200228c ADDR 35c9cbf780
MCG status:
MemCtrl: Corrected patrol scrub error
STATUS 8c000051000800c0 MCGSTATUS 0
MCGCAP 7000c16 APICID 30 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63
Hardware event. This is not a software error.
CPU 30 BANK 8
MISC 152606086 ADDR 35c9cbf740
MCG status:
STATUS 8c00004000010090 MCGSTATUS 0
MCGCAP 7000c16 APICID 30 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63
```

--> MemCtrl: Corrected patrol scrub error

As I understand some memory error was detected and corrected.

mcelog should also provide the location of the ram bank, but obviously that can't be run from my machine. (see option --dmi).


----------



## SirDice (Jan 6, 2017)

IPTRACE said:


> I've found a tool dmidecode and linked the MCA: Addresses with properly address range on memory.
> Is it correct?


Yep. Those are memory errors. They've been corrected due to ECC so there's not a direct problem but the module does need to be replaced. And it looks like you have found the correct modules.


----------



## PacketMan (Jan 7, 2017)

SirDice said:


> .....the module does need to be replaced.



For my knowledge; any chance reseating the memory stick would clear this?


----------



## SirDice (Jan 9, 2017)

PacketMan said:


> For my knowledge; any chance reseating the memory stick would clear this?


Probably not.


----------

