PureBasic Forum
http://forums.purebasic.com/english/

Population count
http://forums.purebasic.com/english/viewtopic.php?f=35&t=62235
Page 2 of 3

Author:  wilbert [ Fri May 15, 2015 10:34 am ]
Post subject:  Re: Population count

davido wrote:
I've chosen the PB version of the method described by idle; it runs at 210%.
I present it, just in case it can be improved upon

Code:
Procedure popcount9(i.l)
  i - (i >> 1) & $55555555
  i = (i & $33333333) + ((i >> 2) & $33333333)
  ProcedureReturn (((i + i >> 4) & $0f0f0f0f) * $01010101) >> 24 & $ff
EndProcedure

Author:  davido [ Fri May 15, 2015 12:21 pm ]
Post subject:  Re: Population count

@wilbert,
Well squeezed! Thank you very much. :D

Author:  luis [ Fri May 15, 2015 12:36 pm ]
Post subject:  Re: Population count

@davido

Did you miss the link I posted above ?

Code:
int swar(uint32_t i) {
  i = i - ((i >> 1) & 0x55555555);
  i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
  return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}


It was already there. Don't tell me "but it's in C !" :)

Author:  wilbert [ Fri May 15, 2015 12:49 pm ]
Post subject:  Re: Population count

luis wrote:
It was already there. Don't tell me "but it's in C !" :)

I know, but it didn't work when I tried it exactly like that in PureBasic x64.
It turned out that the result of the multiplication instruction on x64 is a 64 bit value and that 64 bit value is shifted 24 bits to the right resulting in a wrong output.

A lot of the brackets aren't required due to operator priority
Code:
Procedure popcount9(i.l)
  i - i >> 1 & $55555555
  i = i & $33333333 + i >> 2 & $33333333
  ProcedureReturn ((i + i >> 4) & $0f0f0f0f * $01010101) >> 24 & $ff
EndProcedure
but it might make less sense this way.

Author:  BasicallyPure [ Fri May 15, 2015 6:14 pm ]
Post subject:  Re: Population count

Here is one I came up with.
It yields about 285% against wilbert's assembly code.
Code:
Procedure PopCount_10(k.l) ; BasicallyPure
   Macro AaS : result + k & 1 : k >> 1 : EndMacro
   Protected result
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS
   ProcedureReturn result
EndProcedure


This is all very interesting but I don't have any idea what it can be used for.
It's obviously very useful or there wouldn't be an assembly instruction for it.

edit:
Never mind, a little research on 'PopCnt' or 'PopCount' told me more than I ever wanted to know.
It's probably a safe bet that I'll never use it.

Author:  davido [ Sat May 16, 2015 12:46 am ]
Post subject:  Re: Population count

@luis,
My apologies, I did get the code from the link you posted.
Thank you.

Author:  wilbert [ Sat May 16, 2015 5:01 am ]
Post subject:  Re: Population count

Seems indeed to scale up to 64 bits as well :)
Code:
Procedure PopCount64(x.q)
  !mov rax, [p.v_x]
  !mov rdx, rax
  !shr rdx, 1
  !and rdx, [popcount64_v55]
  !sub rax, rdx
  ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
  !mov rdx, rax       ;x
  !and rax, [popcount64_v33]
  !shr rdx, 2
  !and rdx, [popcount64_v33]
  !add rax, rdx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f0f0f
  !mov rdx, rax
  !shr rdx, 4
  !add rax, rdx
  !and rax, [popcount64_v0f]
  ;x * $0101010101010101 >> 56
  !imul rax, [popcount64_v01]
  !shr rax, 56
  ProcedureReturn
  !popcount64_v01: dq 0x0101010101010101
  !popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount64_v33: dq 0x3333333333333333
  !popcount64_v55: dq 0x5555555555555555
EndProcedure

x = Random($FFFFFFFF)
Debug Bin(x)
Debug Popcount64(x)


Code:
Procedure PopCount64(x.q); SSE2 (both x86 and x64)
  !movd xmm0, [p.v_x]
  !movd xmm1, [p.v_x + 4]
  !movq xmm2, [popcount64_v55]
  !movq xmm3, [popcount64_v33]
  !movq xmm4, [popcount64_v0f]
  !punpckldq xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 1
  !pand xmm1, xmm2
  !psubb xmm0, xmm1
  !movdqa xmm1, xmm0
  !pand xmm0, xmm3
  !psrlq xmm1, 2
  !pand xmm1, xmm3
  !paddb xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 4
  !paddb xmm0, xmm1
  !pand xmm0, xmm4
  !pxor xmm1, xmm1
  !psadbw xmm0, xmm1
  !movd eax, xmm0
  ProcedureReturn
  !popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount64_v33: dq 0x3333333333333333
  !popcount64_v55: dq 0x5555555555555555
EndProcedure


Using twice 32 bits
Code:
Procedure Popcount64(x.q)
  !mov eax, [p.v_x]
  !mov ecx, [p.v_x + 4]
  !mov edx, ecx
  !shr edx, 1
  !jz popcount64_l0
  !and edx, 0x55555555
  !sub ecx, edx
  !mov edx, ecx
  !and ecx, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add ecx, edx
  !mov edx, ecx
  !shr edx, 4
  !add ecx, edx
  !and ecx, 0x0f0f0f0f
  !popcount64_l0:
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax, edx
  !mov edx, eax
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax, edx
  !mov edx, eax
  !shr edx, 4
  !add eax, edx
  !and eax, 0x0f0f0f0f
  !add eax, ecx
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn
EndProcedure


A basic version should look like this
Code:
Procedure PopCount64b1(i.q)
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  ProcedureReturn ((i + i >> 4) & $0f0f0f0f0f0f0f0f * $0101010101010101) >> 56
EndProcedure

But this fails on x64. This code however
Code:
Procedure PopCount64b2(i.q)
  Protected m.q = $0101010101010101
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  i = (i + i >> 4) & $0f0f0f0f0f0f0f0f
  ProcedureReturn (i * m) >> 56
EndProcedure
works fine on x64.
PB bug?

Author:  idle [ Sat May 16, 2015 7:24 am ]
Post subject:  Re: Population count

Wilbert do you know why x64 can't use immediates for the constants?
In my x64 version above I had to load them into a register
Code:
...
!mov r15, 0x5555555555555555
!and rdx, r15
...

Author:  wilbert [ Sat May 16, 2015 7:38 am ]
Post subject:  Re: Population count

:oops: :oops: :oops: I completely overlooked your 64 bit version post Idle :oops: :oops: :oops:

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .

Author:  idle [ Sat May 16, 2015 8:54 am ]
Post subject:  Re: Population count

wilbert wrote:
:oops: :oops: :oops: I completely overlooked your 64 bit version post Idle :oops: :oops: :oops:

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .


That's good to know, I didn't expect it'd be an issue and couldn't understand why it only worked via mov.

Author:  wilbert [ Sat May 16, 2015 9:03 am ]
Post subject:  Re: Population count

Here's a speed test for some PopCount64 versions (results are copied to the clipboard).

Speed test code
Code:
; PopCount 64 bit test

EnableExplicit

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
 
  Procedure PopCount0(k.q)
    !mov rax, [p.v_k]
    !popcnt rax, rax
    ProcedureReturn
  EndProcedure
 
  Procedure Popcount1(x.q); Idle
                          ;x - (x >> 1) &  $5555555555555555           
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !mov r15, 0x5555555555555555
    !and rdx, r15
    !sub rax,rdx
    ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
    !mov rdx, rax       ;x
    !mov r15, 0x3333333333333333
    !and rax, r15 
    !shr rdx, 2
    !and rdx, r15
    !add rax,rdx
    ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f
    !mov rdx, rax
    !shr rdx, 4
    !add rax,rdx
    !mov r15, 0x0f0f0f0f0f0f0f0f
    !and rax, r15
    ;x * 0101010101010101 >> 56
    !mov r15, 0x0101010101010101
    !imul rax, r15
    !shr rax, 56
    ProcedureReturn 
  EndProcedure
 
  Procedure PopCount2(x.q)
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !and rdx, [popcount2_v55]
    !sub rax, rdx
    ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
    !mov rdx, rax       ;x
    !and rax, [popcount2_v33]
    !shr rdx, 2
    !and rdx, [popcount2_v33]
    !add rax, rdx
    ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f0f0f
    !mov rdx, rax
    !shr rdx, 4
    !add rax, rdx
    !and rax, [popcount2_v0f]
    ;x * $0101010101010101 >> 56
    !imul rax, [popcount2_v01]
    !shr rax, 56
    ProcedureReturn
    !popcount2_v01: dq 0x0101010101010101
    !popcount2_v0f: dq 0x0f0f0f0f0f0f0f0f
    !popcount2_v33: dq 0x3333333333333333
    !popcount2_v55: dq 0x5555555555555555
  EndProcedure
 
CompilerEndIf

Procedure PopCount3(x.q); SSE2 (both x86 and x64)
  !movd xmm0, [p.v_x]
  !movd xmm1, [p.v_x + 4]
  !movq xmm2, [popcount3_v55]
  !movq xmm3, [popcount3_v33]
  !movq xmm4, [popcount3_v0f]
  !punpckldq xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 1
  !pand xmm1, xmm2
  !psubb xmm0, xmm1
  !movdqa xmm1, xmm0
  !pand xmm0, xmm3
  !psrlq xmm1, 2
  !pand xmm1, xmm3
  !paddb xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 4
  !paddb xmm0, xmm1
  !pand xmm0, xmm4
  !pxor xmm1, xmm1
  !psadbw xmm0, xmm1
  !movd eax, xmm0
  ProcedureReturn
  !popcount3_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount3_v33: dq 0x3333333333333333
  !popcount3_v55: dq 0x5555555555555555
EndProcedure

Procedure Popcount4(x.q)
  !mov eax, [p.v_x]
  !mov ecx, [p.v_x + 4]
  !mov edx, ecx
  !shr edx, 1
  !jz popcount64_l0
  !and edx, 0x55555555
  !sub ecx, edx
  !mov edx, ecx
  !and ecx, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add ecx, edx
  !mov edx, ecx
  !shr edx, 4
  !add ecx, edx
  !and ecx, 0x0f0f0f0f
  !popcount64_l0:
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax, edx
  !mov edx, eax
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax, edx
  !mov edx, eax
  !shr edx, 4
  !add eax, edx
  !and eax, 0x0f0f0f0f
  !add eax, ecx
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn
EndProcedure

Procedure PopCount5(i.q)
  Protected m.q = $0101010101010101
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  i = (i + i >> 4) & $0f0f0f0f0f0f0f0f
  ProcedureReturn (i * m) >> 56
EndProcedure


; ---- Initialisation ----
Define.i i, x, x0, x1, x2, x3, x4, x5, y, rep=10000000
Define.i t0, t1, t2, t3, t4, t5
Define.s s

Dim Rnd.q(rep)
For i = 1 To rep
  Rnd(i) = Random(2147483647) << 32 | Random(2147483647)
Next


; ---- Small check whether all procedures Return the same results ----
For i = 1 To 100
  x5 = PopCount5(Rnd(i))
  x4 = PopCount4(Rnd(i))
  x3 = PopCount3(Rnd(i))
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    x2 = PopCount2(Rnd(i))
    x1 = PopCount1(Rnd(i))
    x0 = PopCount0(Rnd(i))
  CompilerElse
    x2 = x3
    x1 = x2
    x0 = x1
  CompilerEndIf
  If x0 <> x1 Or
     x0 <> x2 Or
     x0 <> x3 Or
     x0 <> x4 Or
     x0 <> x5
    MessageRequester("Error",
                     "Different results for PopCount(" + Rnd(i) + ")")
    End
  EndIf
Next   


; ---- Speed test ----
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
  t0 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount0(Rnd(i))
  Next   
  t0 = ElapsedMilliseconds() - t0
 
  t1 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount1(Rnd(i))
  Next   
  t1 = ElapsedMilliseconds() - t1
 
  t2 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount2(Rnd(i))
  Next   
  t2 = ElapsedMilliseconds() - t2
CompilerEndIf

t3 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount3(Rnd(i))
Next   
t3 = ElapsedMilliseconds() - t3

t4 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount4(Rnd(i))
Next   
t4 = ElapsedMilliseconds() - t4

t5 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount5(Rnd(i))
Next   
t5 = ElapsedMilliseconds() - t5


s = "t0 = " + t0 + " ms   (" + Int(100*t0/t4) + "%)" + #LF$ +
    "t1 = " + t1 + " ms   (" + Int(100*t1/t4) + "%)" + #LF$ +
    "t2 = " + t2 + " ms   (" + Int(100*t2/t4) + "%)" + #LF$ +
    "t3 = " + t3 + " ms   (" + Int(100*t3/t4) + "%)" + #LF$ +
    "t4 = " + t4 + " ms   (" + Int(100*t4/t4) + "%)" + #LF$ +
    "t5 = " + t5 + " ms   (" + Int(100*t5/t4) + "%)"
SetClipboardText(s)
MessageRequester("PopCount speed test", s)



Results OSX , PB x86 wrote:
t0 = 0 ms (0%)
t1 = 0 ms (0%)
t2 = 0 ms (0%)
t3 = 56 ms (80%)
t4 = 70 ms (100%)
t5 = 415 ms (592%)


Results OSX , PBx64 wrote:
t0 = 33 ms (46%)
t1 = 54 ms (76%)
t2 = 50 ms (70%)
t3 = 52 ms (73%)
t4 = 71 ms (100%)
t5 = 71 ms (100%)

Author:  Little John [ Sat May 16, 2015 12:49 pm ]
Post subject:  Re: Population count

@BasicallyPure:
I'm using the Population Count for generating the Thue-Morse sequence.

Author:  BasicallyPure [ Sat May 16, 2015 7:15 pm ]
Post subject:  Re: Population count

Ahh I see, it's the old Thue-Morse sequence, I should have guessed. :lol:

Nicely documented by the way.
Thanks.

Author:  idle [ Sun Jun 14, 2015 6:53 am ]
Post subject:  Re: Population count

seems I forgot to post this but it the x86 specific popcountmem functions need checking

Code:

Procedure _Popcount32(x.l)
  ;x - (x >> 1) &  $55555555           
  !mov eax, [p.v_x]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  ;x = (x & $33333333) + ((x >> 2) & $33333333)
  !mov edx, eax       ;x
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  ;x * 0x01010101 >> 24
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn 
EndProcedure

CompilerIf SizeOf(Integer) = 4

Procedure _popcountmem(*mem.long,len.l)
  Protected result,a,b
       
  !mov eax, [p.v_len]
  !and eax, 3
  !mov edx, [p.v_len]
  !sub edx,eax
  !mov [p.v_a],edx
  !mov [p.v_b],eax
  !xor ecx,ecx
  !lwhile:
  !cmp ecx, [p.v_a]
  !jge lend
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
  !add ecx,4
  !jmp lwhile
  !lend:
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
   
  !mov edx, [p.v_b]
  !jmp dword [JT_remain + edx * 4]
  !JT_remain dd le,l1,l2,l3,l4
   
  !l1:
  !and eax, 0xff
  !jmp le
  !l2:
  !and eax, 0xffff
  !jmp le
  !l3:
  !and eax, 0xffffff
  !jmp le
  !l4:
  !le:
   
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
   
  ProcedureReturn result   
EndProcedure   

Procedure _popcountmemSSE(*mem,len)
  Protected result,a,b
       
  !mov eax, [p.v_len]
  !and eax, 3
  !mov edx, [p.v_len]
  !sub edx,eax
  !mov [p.v_a],edx
  !mov [p.v_b],eax
  !xor ecx,ecx
  !lwhile1:
  !cmp ecx, [p.v_a]
  !jge lend1
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
  !popcnt eax,eax   
  !add [p.v_result],eax   
  !add ecx,4
   
  !jmp lwhile1
  !lend1:
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
   
  !mov edx, [p.v_b]
  !jmp dword [JT_remain1 + edx * 4]
  !JT_remain1 dd lle,ll1,ll2,ll3,ll4
   
  !ll1:
  !and eax, 0xff
  !jmp lle
  !ll2:
  !and eax, 0xffff
  !jmp lle
  !ll3:
  !and eax, 0xffffff
  !jmp lle
  !ll4:
  !lle:
 
  !mov edx,eax
  !popcnt eax,edx   
  !add [p.v_result],eax   
 
  ProcedureReturn result 
EndProcedure   


CompilerElse

Procedure _Popcount64(x.i)
  ;x - (x >> 1) &  $5555555555555555           
  !mov rax, [p.v_x]
  !mov rdx, rax
  !shr rdx, 1
  !mov r15, 0x5555555555555555
  !and rdx, r15
  !sub rax,rdx
  ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
  !mov rdx, rax       ;x
  !mov r15, 0x3333333333333333
  !and rax, r15 
  !shr rdx, 2
  !and rdx, r15
  !add rax,rdx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f
  !mov rdx, rax
  !shr rdx, 4
  !add rax,rdx
  !mov r15, 0x0f0f0f0f0f0f0f0f
  !and rax, r15
  ;x * 0101010101010101 >> 56
  !mov r15, 0x0101010101010101
  !imul rax, r15
  !shr rax, 56
  ProcedureReturn 
EndProcedure

Procedure _popcountmem(*mem,len)
  Protected result,a,b
       
  !mov rax, [p.v_len]
  !And rax, 3
  !mov rdx, [p.v_len]
  !sub rdx,rax
  !mov [p.v_a],rdx
  !mov [p.v_b],rax
  !XOr rcx,rcx
  !lwhile:
  !cmp rcx, [p.v_a]
  !jge lend
  !mov rax, [p.p_mem]
  !mov eax, [rax + rcx]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
  !add rcx,4
  !jmp lwhile
  !lend:
 
  !mov rax, [p.p_mem]
  !mov eax, [rax + rcx]
 
  !lea rdx, [JT_remain]
  !mov rcx, [p.v_b]
  !jmp qword [rdx + rcx * 8]
  !JT_remain dq le,l1,l2,l3,l4
   
  !l1:
  !and eax, 0xff
  !jmp le
  !l2:
  !and eax, 0xffff
  !jmp le
  !l3:
  !and eax, 0xffffff
  !jmp le
  !l4:
  !le:
   
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
   
  ProcedureReturn result   
EndProcedure   

Procedure _popcountmemSSE(*mem,len)
  Protected result,a,b
       
  !mov rax, [p.v_len]
  !and rax, 7
  !mov rdx, [p.v_len]
  !sub rdx,rax
  !mov [p.v_a],rdx
  !mov [p.v_b],rax
  !xor rcx,rcx
  !lwhile1:
  !cmp rcx, [p.v_a]
  !jge lend1
 
  !mov rax, [p.p_mem]
  !mov rax, [rax + rcx]
  !popcnt rax,rax   
  !add [p.v_result],rax   
  !add rcx,8
   
  !jmp lwhile1
  !lend1:
 
  !mov rax, [p.p_mem]
  !mov rax, [rax + rcx]
   
  !lea rdx, [JT_remain1]
  !mov rcx, [p.v_b]
  !jmp qword [rdx + rcx * 8]
  !JT_remain1 dq lle,ll1,ll2,ll3,ll4,ll5,ll6,ll7,ll8
   
  !ll1:
  !and rax, 0xff
  !jmp lle
  !ll2:
  !and rax, 0xffff
  !jmp lle
  !ll3:
  !and rax, 0xffffff
  !jmp lle
  !ll4:
  !mov rdx,0xffffffff
  !and rax, rdx
  !jmp lle
  !ll5:
  !mov rdx, 0xffffffffff
  !and rax, rdx
  !jmp lle
  !ll6:
  !mov rdx,0xffffffffffff
  !and rax, rdx
  !jmp lle
  !ll7:
  !mov rdx,0xffffffffffffff
  !and rax, rdx
  !jmp lle
  !ll8:
  !lle:
 
  !mov rdx,rax
  !popcnt rax,rdx   
  !add [p.v_result],rax   
 
  ProcedureReturn result 
EndProcedure   
 
CompilerEndIf

Procedure _PopCountSSE4(x.i)
  CompilerIf SizeOf(Integer) = 8
    !popcnt rax,[p.v_x]   
  CompilerElse
    !popcnt eax,[p.v_x]   
  CompilerEndIf   
  ProcedureReturn
EndProcedure   
 
Prototype PopCount(x.i)
Prototype PopCountMem(*mem,len)
Global PopCount.PopCount
Global PopCountMem.PopCountMem

Procedure InitpopCount(FallBackMode.i=64)
  Protected result
  !mov eax, 1
  !cpuid
  !shr ecx, 23
  !and ecx, 1
  !mov [p.v_result],ecx
  If result
    popcount = @_PopCountSSE4()
    popcountmem = @_popcountmemSSE()
  Else
    CompilerIf SizeOf(Integer) = 8
      If FallBackMode = 64
        popcount = @_popcount64()
        popcountmem = @_popcountmem()
      Else
        popcount = @_popcount32()
        popcountmem = @_popcountmem()
      EndIf   
    CompilerElse
      popcount = @_popcount32()
      popcountmem = @_popcountmem()
    CompilerEndIf
 EndIf   
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
 
  InitpopCount() ; sets popcount to the best supported function
  ;on x64 if you want to set the mode to the 32 function range, call it with InitpopCount(#PB_Long)
 
  x.i = %10101010101010101010101010101010
  Debug PopCount(x)  ;16
 
  mx = Random(1024,128)
  *mem = AllocateMemory(mx)
  *pa.Ascii = *mem 
  For a = 0 To mx-1
   *pa\a = 3
   *pa+1
   r+2
  Next   
  Debug r     
  Debug PopCountMem(*mem,mx)   
  Debug _popcountmem(*mem,mx) 
 
CompilerEndIf

Author:  Keya [ Thu Feb 04, 2016 4:10 pm ]
Post subject:  Re: Population count

great work everyone!:) Ive simply added a CPUID check for SSE4 POPCNT, and then just combined the 32&64bit asm routines posted here along with a Prototype function pointer so it's always pointing at the best function
Code:
Prototype.i protoGetBitCnt(number.i)

Procedure.i HasSSE4_POPCNT()
  EnableASM
  Protected.i reg_bx
  CompilerIf SizeOf(Integer) = 8
    !mov [p.v_reg_bx], rbx
  CompilerElse
    !mov [p.v_reg_bx], ebx
  CompilerEndIf 
  ! mov eax, 1
  ! cpuid   
  ! xor eax, eax 
  ! bt ecx, 23   ;popcnt
  ! setc al   
  CompilerIf SizeOf(Integer) = 8
    !mov rbx, [p.v_reg_bx]
  CompilerElse
    !mov ebx, [p.v_reg_bx]
  CompilerEndIf
  ProcedureReturn
  DisableASM
EndProcedure


Procedure GetBitCountSSE4(x.i)
  CompilerIf SizeOf(Integer) = 8
    !popcnt rax,[p.v_x]   
  CompilerElse
    !popcnt eax,[p.v_x]   
  CompilerEndIf   
  ProcedureReturn
EndProcedure 


Procedure GetBitCountAsm(x.i)
  CompilerIf SizeOf(Integer) = 8
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !mov r15, 0x5555555555555555
    !and rdx, r15
    !sub rax,rdx
    !mov rdx, rax       
    !mov r15, 0x3333333333333333
    !and rax, r15
    !shr rdx, 2
    !and rdx, r15
    !add rax,rdx
    !mov rdx, rax
    !shr rdx, 4
    !add rax,rdx
    !mov r15, 0x0f0f0f0f0f0f0f0f
    !and rax, r15
    !mov r15, 0x0101010101010101
    !imul rax, r15
    !shr rax, 56
  CompilerElse
    !mov eax, [p.v_x]
    !mov edx, eax
    !shr edx, 1
    !and edx, 0x55555555
    !sub eax,edx
    !mov edx, eax       
    !and eax, 0x33333333
    !shr edx, 2
    !and edx, 0x33333333
    !add eax,edx
    !mov edx, eax
    !shr edx, 4
    !add eax,edx
    !and eax, 0x0f0f0f0f
    !imul eax, 0x01010101
    !shr eax, 24
  CompilerEndIf
  ProcedureReturn
EndProcedure

 
;Self-initialize
If HasSSE4_POPCNT() = 1
  Global GetBitCnt.protoGetBitCnt = @GetBitCountSSE4()
Else
  Global GetBitCnt.protoGetBitCnt = @GetBitCountAsm()
EndIf

;---
;from now you can simply call GetBitCnt(value), and it will automatically
;be pointing to either the SSE4 or asm procedure depending on your CPU

ps. i really wish the Thread Subject included the words COUNT and BITS as I nearly missed this thread! (and besides, 32 is more of a party than a population, so yes Intel shouldve named the instruction BITCNT if not PARTYCNT) :D

Page 2 of 3 All times are UTC + 1 hour
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/