# Memory optimization for APU board



## gregober (Nov 8, 2022)

I have a fairly large set of data (about 5 millions entries) which are loaded into Unbound to do some RPZ filtering.

On an Intel(R) Atom(TM) CPU C3558 @ 2.20GHz (4 cores) with 8GB of RAM, loading the list takes about 1GB of RAM

On an APU board with 4 cores AMD GX-412TC SOC (4 cores) and 4GB of RAM, loading the same lists takes about 3.5GB of RAM

So I am wondering why there might be such a difference between the two ??
And if there are some parameters which I might tune to have a more efficient memory handling on the APU boards ?

Thanks!


----------



## gregober (Nov 8, 2022)

APU is based on 4GB of DDR3 ECC @1333MHz 
AMD board has 8GB of DDR4 ECC @2400MHz 

I guess design and efficiency + diff betwen DDR3 / DDR4 does the ∆.


----------



## zirias@ (Nov 8, 2022)

Is the Atom a 32bit model, or does it run an i386 kernel?


----------



## covacat (Nov 8, 2022)

unbound for dos runs in 640K


----------



## ralphbsz (Nov 8, 2022)

gregober said:


> I have a fairly large set of data (about 5 millions entries) ...


What does "entry" mean? What does the data entail? I assume 5 million is the number of rows, but how many columns?



> On an Intel(R) Atom(TM) CPU C3558 @ 2.20GHz (4 cores) with 8GB of RAM, loading the list takes about 1GB of RAM


First, I don't believe that measurement, and I'll explain why below. But if we assume that it is accurate, that would be about 200 bytes per entry. Is that compatible with the data volume you'd expect?



> On an APU board with 4 cores AMD GX-412TC SOC (4 cores) and 4GB of RAM, loading the same lists takes about 3.5GB of RAM


I would understand a factor of two (see below), but this is a factor of 3.5. This probably means that you are not measuring actual memory usage, but something like resident size.



gregober said:


> I guess design and efficiency + diff betwen DDR3 / DDR4 does the ∆.


No, at the level of CPU architecture and memory model, the only difference between DDR3 and DDR4 is speed.



zirias@ said:


> Is the Atom a 32bit model, or does it run an i386 kernel?


The Atom is already 64-bit CPU. I don't even know whether it can be booted into 32-bit mode or not; it claims to support only the amd64 instruction set, not the i386 one, but I'm not 100% I really believe that.

That word size might explain a factor of 2, if one of the CPUs is a 32-bit machine and the other a 64-bit one, and much of the data are integers or similar types that are stored in a single word. But assuming the data really is 200 bytes per entry, that would be 25 to 50 integers per row, which seems a bit excessive; I suspect a significant fraction of the data might be strings.

But it can not explain a factor of 3.5.


----------



## bob2112 (Nov 8, 2022)

ralphbsz said:


> That word size might explain a factor of 2, if one of the CPUs is a 32-bit machine and the other a 64-bit one, and much of the data are integers or similar types that are stored in a single word.


FWIW the difference in memory usage between 64bit and 32bit is down to pointer size.


----------



## ralphbsz (Nov 8, 2022)

Absolutely, pointers and most integers go from 32 to 64 bits depending on word size. But for big arrays of data, pointers should be a small overhead. Floating-point numbers (in modern code) are nearly always 64 bits anyway (single precision float is rare these days). Strings don't depend on word size. So at best, the change from 32 to 64 bits could explain a factor of 2, probably less. There is no way for a change by a factor of 3.5 based on word size, so something here makes no sense.


----------



## zirias@ (Nov 9, 2022)

Nope, not only pointers, alignments play a role as well. But indeed, as long as the software doesn't pick some weird alignment decisions, a factor of 2 or more isn't possible.


----------

