Bo Marchais wrote:
There was once a stupid trick using local lookup tables for direct conversion, via xlat, or modern equivalent. You can build a 256 entry table mapping 8 bits to # of bits, and then do an xlat. It's only useful in stupid cases...and nothing else.
I was interested in this stupid trick!
Code:
Procedure.i GetBitCount(byte.a)
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
! mov rax, [p.v_byte]
! mov rdx, rbx
! mov rbx, BitCounts
! xlatb
! mov rbx, rdx
CompilerElseIf #PB_Compiler_Processor = #PB_Processor_x86
! mov eax, [p.v_byte]
! mov edx, ebx
! mov ebx, BitCounts
! xlatb
! mov ebx, edx
CompilerEndIf
ProcedureReturn
DataSection
!BitCounts:
Data.a 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
Data.a 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
Data.a 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
Data.a 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
EndDataSection
EndProcedure
It performs very well, i used John's test from the first page of this thread, but doubled the amount of tests to make the results a bit clearer as they're all so fast:
Code:
t0 = 36 ms (10%) <-- popcnt instruction
t1 = 350 ms (100%)
t2 = 420 ms (120%)
t3 = 1032 ms (294%)
t4 = 1906 ms (544%)
t5 = 1837 ms (524%)
t6 = 1242 ms (354%)
t7 = 57 ms (16%)
t8 = 44 ms (12%) <-- this demo using xlatb