Optimizing rotations for quad

Bare metal programming in PureBasic, for experienced users
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Optimizing rotations for quad

Post by netmaestro »

@wilbert: After quite some tests my results show, for the full range of rotations 0-63, your latest is beating everything so far by a minimum of 8%. That is a significant improvement 8)
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Optimizing rotations for quad

Post by wilbert »

Thanks for letting me know Netmaestro :)
What probably is the greatest difference, is the swap from eax and edx that you did with three instructions while I simply switched the place eax and edx where loaded from.
What surprised me with my code that uses push / pop ebx is the impact of where they are placed. I don't know much yet about optimizing but having the push and pop so close together without any instruction in between that accesses memory seemed faster compared to placing the push at the beginning and the pop at the end of the function.

A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: Optimizing rotations for quad

Post by Thorium »

wilbert wrote: A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.

Code: Select all

!push ecx
!shr ecx,2
!rep movsd
!pop ecx
!and ecx,3
!rep movsb
Size of memory block to copy goes into ecx. Source address goes into esi and destination address into edi.
Pretty basic code, but on Core i7 it's the fastest. I guess the CPU recognizes the algo and switch to a build in fast memory copy algo. On older CPU's using the SSE registers with prefetching is much faster, on Core i7 this simple code beats SSE.
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Re: Optimizing rotations for quad

Post by Helle »

I made any tests and this was the fastest (no jumps!):

Code: Select all

Procedure.q Rotr64_(val.q, n)
  !mov eax,[esp + 4]
  !mov edx,[esp + 8]
  !mov ecx,[esp + 12]
  !test ecx,100000b     ;test is my favorite ;-) 
  !cmovnz eax,edx
  !cmovnz edx,[esp + 4]
  !push ebx
  !mov ebx, eax
  !shrd eax, edx, cl
  !shrd edx, ebx, cl
  !pop ebx
  ProcedureReturn
EndProcedure
A test with "xchg eax,edx" was not faster. I use an Intel i7-2600; maybe is this code not faster on an older cpu. You can test it :) !
Helle
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Optimizing rotations for quad

Post by netmaestro »

It's very fast but on my rather weak machine (1.8ghz Intel E2160) it's losing to wilbert's latest by 5%. It's cool code, I'm still trying to figure out how it works.
BERESHEIT
Post Reply