Page 3 of 3

Re: [split] Compiler off-loading discussion

Posted: Tue Nov 29, 2016 9:42
by Graf Zahl
Why are they weird? I get the same phenomenon with SSE2 vs. x87 on my system. Every test I made came back with x87 marginally faster than SSE2 when doing 32 bit floating point math, despite common wisdom saying the opposite and many people claiming I must be doing something wrong when x87 comes out faster. I believe a lot 'information' in this field is based on hearsay and unverified assumptions. AVX2 stilluses the same underlying hardware, so unless its instruction set is significantly better for the job at hand I'd consider it mostly window-dressing but not somthing that truly yields better results.

Re: [split] Compiler off-loading discussion

Posted: Tue Nov 29, 2016 10:07
by dpJudas
Yes, it seems you're right about that. Still, I would have thought that there would be some efficiency to be gained by using a vectorized opcode targeting say 8xint32 rather using the 4xint32 present in SSE 2. Even if it had just been something small like 10%. Maybe it already managed to keep all the ALUs busy with the SSE 2 instruction set, or the bottleneck are the loads and stores.