# Amd64 faster than i386?



## nikitastepanov (Jan 24, 2020)

Amd64 faster than i386?


----------



## Crivens (Jan 24, 2020)

Yes. Next!


----------



## kpedersen (Jan 24, 2020)

To be fair, Crivens said it pretty succinctly. 

It is my understanding that amd64 in theory is slower because the usable memory area is larger so pointers need to be larger chunks of data to pass around because they need to hold any address in the larger data area.

However i386 represents the lowest common denominator of platforms and doesn't use newer processor features (i.e i486, i686 binaries would be faster than i386) and so is less efficient. amd64 represents a bar that in order to be 64-bit, the processor must be relatively new.

What you don't want is modern Windows where it is 64-bit but you primarily run 32-bit applications through the WoW64 translation layer. Worst of both worlds (+ a translation layer) basically.


----------



## shkhln (Jan 24, 2020)

kpedersen said:


> What you don't want is modern Windows where it is 64-bit but you primarily run 32-bit applications through the WoW64 translation layer. Worst of both worlds (+ a translation layer) basically.



I believe Windows only switches from 32- to 64-bit code right before invoking syscalls. That should not have _any_ measurable performance impact.


----------



## kpedersen (Jan 24, 2020)

shkhln said:


> I believe Windows only switches from 32- to 64-bit code right before invoking syscalls. That should not have _any_ measurable performance impact.



Yes, the main performance loss is with IA64 or ARM64 because it does actual emulation rather than executing instructions natively but there is still an overhead of around 2% for AMD64 when measured.
https://www.viva64.com/en/t/0056/

I used to have a much better source with some cool measurements of common tools but I cant seem to find it. I'll track it down. Ironically I remember the slight performance hit *because* WoW64 is quite an impressive piece of engineering relating to digital preservation. I believe it originally came from IBM's partnership with Microsoft in OS/2 Warp (or did the 32-bit -> 16-bit ntvdm stuff come from there; honestly can't recall!).


----------



## shkhln (Jan 24, 2020)

kpedersen said:


> Yes, the main performance loss is with IA64 or ARM64 because it does actual emulation rather than executing instructions natively but there is still an overhead of around 2% for AMD64 when measured.
> https://www.viva64.com/en/t/0056/



Hmm… Any idea where they pulled that number?


----------



## ralphbsz (Jan 25, 2020)

Before we go into fascinating but irrelevant speculation, maybe we should ask the OP what they really mean by the question? Because in general it is not answerable.

For example, do they mean: "Is the original i386 that was released in 198X at 25 MHz clock speed faster than a early 2000s Athlon with the amd64 instruction set?" That is a very good question, and the answer is blatantly obvious, as the early Athlons and Opterons used to run somewhere around 1 GHz or so.

Or maybe they mean: "If on the same modern hardware, I install a 32-bit version of OS, compiler, libraries and apps, and then do the same in 64-bit mode, which one will run faster?" That is actually a fascinating question, and I don't know the answer off-hand; many factors go into it. But I think in nearly all cases, 64-bit mode is vastly faster.

Or maybe they mean "I have an amd64 installation, and for one particular application or program, is it sensible to run the i386 version?" That is also a fascinating question, and I think you get the same answer: 64 bits are faster.

This doesn't even address the question whether their workload fits into 32 bit address spaces.

I think we should wait until they explain their question.  (By they way, I'm using the gender-neutral pronoun, because I don't know whether the OP is a boy or a girl.)


----------



## shkhln (Jan 25, 2020)

ralphbsz said:


> Before we go into fascinating but irrelevant speculation, maybe we should ask the OP what they really mean by the question?



Why bother? We like tangents.



ralphbsz said:


> (By they way, I'm using the gender-neutral pronoun, because I don't know whether the OP is a boy or a girl.)



Here is my personal pet peeve: it's a male name. _Strictly_ male name.


----------



## ralphbsz (Jan 25, 2020)

Ever seen the french movie "La femme Nikita"?


----------



## Crivens (Jan 25, 2020)

Since we are on tangents already, ralphbsz may need to google "waxing ladyballs". NSFW, and that rabbit hole goes off in such a tangent that you cross into some perpendicular universe (not parallel at all).

But on topic - it depends. Pointers are 8 bytes then, but you have a lot more registers. I386 has how many free? Three?


----------



## shkhln (Jan 25, 2020)

ralphbsz said:


> Ever seen the french movie "La femme Nikita"?



No. I don't have a particularly high opinion of Luc Besson, so I never felt the urge to.


----------



## kpedersen (Jan 25, 2020)

Crivens said:


> But on topic - it depends. Pointers are 8 bytes then, but you have a lot more registers. I386 has how many free? Three?



Ah, I didn't think of the extra registers!

The below article (and comments) are quite interesting (Solaris 9 / sparc64 era) where they compared 32-bit and 64-bit.

https://www.osnews.com/story/5768/

It probably in no way represents the current situation (compiler optimisations make much better use of the 64-bit hardware now). They did arrive at the conclusion that 32-bit was "faster" but not by much. So today I am guessing amd64 is going to be the better option.


----------



## unitrunker (Jan 25, 2020)

Crivens said:


> i386 has how many free? Three?


Four general purpose plus two index registers plus EBP. Not counting, instruction, stack or flags registers.

Two factors that benefit i386:

1. Lots of registers is nice but it also makes context switching more expensive.

2. i386 instructions are on average smaller so the execution pipeline can churn through them faster.


----------



## Crivens (Jan 25, 2020)

It's the problem of how high is the IPS compared to memory speed. Shorter instructions help nothing when each has to do additional memory cycles to the stack.


----------



## Eric A. Borisch (Jan 25, 2020)

Clearly amd64 is faster, they need to have lots of extra cycles to spare to handle the microcode updates to fix side-channel attacks.

/me winks


----------

