# PureBasic Forum

 It is currently Sun Jan 17, 2021 12:28 am

 All times are UTC + 1 hour

 Page 2 of 3 [ 36 posts ] Go to page Previous  1, 2, 3  Next
 Print view Previous topic | Next topic
Author Message
 Post subject: Re: Population countPosted: Fri May 15, 2015 10:34 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
davido wrote:
I've chosen the PB version of the method described by idle; it runs at 210%.
I present it, just in case it can be improved upon

Code:
Procedure popcount9(i.l)
i - (i >> 1) & \$55555555
i = (i & \$33333333) + ((i >> 2) & \$33333333)
ProcedureReturn (((i + i >> 4) & \$0f0f0f0f) * \$01010101) >> 24 & \$ff
EndProcedure

_________________
macOS 10.15 Catalina, Windows 10

Last edited by wilbert on Fri May 15, 2015 3:00 pm, edited 1 time in total.

Top

 Post subject: Re: Population countPosted: Fri May 15, 2015 12:21 pm

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1804
Location: Uttoxeter, UK
@wilbert,
Well squeezed! Thank you very much.

_________________
DE AA EB

Top

 Post subject: Re: Population countPosted: Fri May 15, 2015 12:36 pm

Joined: Wed Aug 31, 2005 11:09 pm
Posts: 3698
Location: Italy
@davido

Did you miss the link I posted above ?

Code:
int swar(uint32_t i) {
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

It was already there. Don't tell me "but it's in C !"

_________________

Top

 Post subject: Re: Population countPosted: Fri May 15, 2015 12:49 pm
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
luis wrote:
It was already there. Don't tell me "but it's in C !"

I know, but it didn't work when I tried it exactly like that in PureBasic x64.
It turned out that the result of the multiplication instruction on x64 is a 64 bit value and that 64 bit value is shifted 24 bits to the right resulting in a wrong output.

A lot of the brackets aren't required due to operator priority
Code:
Procedure popcount9(i.l)
i - i >> 1 & \$55555555
i = i & \$33333333 + i >> 2 & \$33333333
ProcedureReturn ((i + i >> 4) & \$0f0f0f0f * \$01010101) >> 24 & \$ff
EndProcedure
but it might make less sense this way.

_________________
macOS 10.15 Catalina, Windows 10

Top

 Post subject: Re: Population countPosted: Fri May 15, 2015 6:14 pm
 Enthusiast

Joined: Thu Mar 24, 2011 12:40 am
Posts: 536
Location: Iowa, USA
Here is one I came up with.
It yields about 285% against wilbert's assembly code.
Code:
Procedure PopCount_10(k.l) ; BasicallyPure
Macro AaS : result + k & 1 : k >> 1 : EndMacro
Protected result
AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
AaS : AaS
ProcedureReturn result
EndProcedure

This is all very interesting but I don't have any idea what it can be used for.
It's obviously very useful or there wouldn't be an assembly instruction for it.

edit:
Never mind, a little research on 'PopCnt' or 'PopCount' told me more than I ever wanted to know.
It's probably a safe bet that I'll never use it.

_________________
BasicallyPure
Until you know everything you know nothing, all you have is what you believe.

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 12:46 am

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1804
Location: Uttoxeter, UK
@luis,
My apologies, I did get the code from the link you posted.
Thank you.

_________________
DE AA EB

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 5:01 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
Seems indeed to scale up to 64 bits as well
Code:
Procedure PopCount64(x.q)
!mov rax, [p.v_x]
!mov rdx, rax
!shr rdx, 1
!and rdx, [popcount64_v55]
!sub rax, rdx
;x = (x & \$3333333333333333) + ((x >> 2) & \$3333333333333333)
!mov rdx, rax       ;x
!and rax, [popcount64_v33]
!shr rdx, 2
!and rdx, [popcount64_v33]
;x = (x + (x >> 4)) & \$0f0f0f0f0f0f0f0f0f0f
!mov rdx, rax
!shr rdx, 4
!and rax, [popcount64_v0f]
;x * \$0101010101010101 >> 56
!imul rax, [popcount64_v01]
!shr rax, 56
ProcedureReturn
!popcount64_v01: dq 0x0101010101010101
!popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
!popcount64_v33: dq 0x3333333333333333
!popcount64_v55: dq 0x5555555555555555
EndProcedure

x = Random(\$FFFFFFFF)
Debug Bin(x)
Debug Popcount64(x)

Code:
Procedure PopCount64(x.q); SSE2 (both x86 and x64)
!movd xmm0, [p.v_x]
!movd xmm1, [p.v_x + 4]
!movq xmm2, [popcount64_v55]
!movq xmm3, [popcount64_v33]
!movq xmm4, [popcount64_v0f]
!punpckldq xmm0, xmm1
!movdqa xmm1, xmm0
!psrlq xmm1, 1
!pand xmm1, xmm2
!psubb xmm0, xmm1
!movdqa xmm1, xmm0
!pand xmm0, xmm3
!psrlq xmm1, 2
!pand xmm1, xmm3
!movdqa xmm1, xmm0
!psrlq xmm1, 4
!pand xmm0, xmm4
!pxor xmm1, xmm1
!movd eax, xmm0
ProcedureReturn
!popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
!popcount64_v33: dq 0x3333333333333333
!popcount64_v55: dq 0x5555555555555555
EndProcedure

Using twice 32 bits
Code:
Procedure Popcount64(x.q)
!mov eax, [p.v_x]
!mov ecx, [p.v_x + 4]
!mov edx, ecx
!shr edx, 1
!jz popcount64_l0
!and edx, 0x55555555
!sub ecx, edx
!mov edx, ecx
!and ecx, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, ecx
!shr edx, 4
!and ecx, 0x0f0f0f0f
!popcount64_l0:
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax, edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24
ProcedureReturn
EndProcedure

A basic version should look like this
Code:
Procedure PopCount64b1(i.q)
i - i >> 1 & \$5555555555555555
i = i & \$3333333333333333 + i >> 2 & \$3333333333333333
ProcedureReturn ((i + i >> 4) & \$0f0f0f0f0f0f0f0f * \$0101010101010101) >> 56
EndProcedure

But this fails on x64. This code however
Code:
Procedure PopCount64b2(i.q)
Protected m.q = \$0101010101010101
i - i >> 1 & \$5555555555555555
i = i & \$3333333333333333 + i >> 2 & \$3333333333333333
i = (i + i >> 4) & \$0f0f0f0f0f0f0f0f
ProcedureReturn (i * m) >> 56
EndProcedure
works fine on x64.
PB bug?

_________________
macOS 10.15 Catalina, Windows 10

Last edited by wilbert on Mon May 18, 2015 7:27 am, edited 6 times in total.

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 7:24 am

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
Wilbert do you know why x64 can't use immediates for the constants?
In my x64 version above I had to load them into a register
Code:
...
!mov r15, 0x5555555555555555
!and rdx, r15
...

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 7:38 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
I completely overlooked your 64 bit version post Idle

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .

_________________
macOS 10.15 Catalina, Windows 10

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 8:54 am

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
wilbert wrote:
:oops: I completely overlooked your 64 bit version post Idle

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .

That's good to know, I didn't expect it'd be an issue and couldn't understand why it only worked via mov.

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 9:03 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
Here's a speed test for some PopCount64 versions (results are copied to the clipboard).

Speed test code
Code:
; PopCount 64 bit test

EnableExplicit

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64

Procedure PopCount0(k.q)
!mov rax, [p.v_k]
!popcnt rax, rax
ProcedureReturn
EndProcedure

Procedure Popcount1(x.q); Idle
;x - (x >> 1) &  \$5555555555555555
!mov rax, [p.v_x]
!mov rdx, rax
!shr rdx, 1
!mov r15, 0x5555555555555555
!and rdx, r15
!sub rax,rdx
;x = (x & \$3333333333333333) + ((x >> 2) & \$3333333333333333)
!mov rdx, rax       ;x
!mov r15, 0x3333333333333333
!and rax, r15
!shr rdx, 2
!and rdx, r15
;x = (x + (x >> 4)) & \$0f0f0f0f0f0f0f0f
!mov rdx, rax
!shr rdx, 4
!mov r15, 0x0f0f0f0f0f0f0f0f
!and rax, r15
;x * 0101010101010101 >> 56
!mov r15, 0x0101010101010101
!imul rax, r15
!shr rax, 56
ProcedureReturn
EndProcedure

Procedure PopCount2(x.q)
!mov rax, [p.v_x]
!mov rdx, rax
!shr rdx, 1
!and rdx, [popcount2_v55]
!sub rax, rdx
;x = (x & \$3333333333333333) + ((x >> 2) & \$3333333333333333)
!mov rdx, rax       ;x
!and rax, [popcount2_v33]
!shr rdx, 2
!and rdx, [popcount2_v33]
;x = (x + (x >> 4)) & \$0f0f0f0f0f0f0f0f0f0f
!mov rdx, rax
!shr rdx, 4
!and rax, [popcount2_v0f]
;x * \$0101010101010101 >> 56
!imul rax, [popcount2_v01]
!shr rax, 56
ProcedureReturn
!popcount2_v01: dq 0x0101010101010101
!popcount2_v0f: dq 0x0f0f0f0f0f0f0f0f
!popcount2_v33: dq 0x3333333333333333
!popcount2_v55: dq 0x5555555555555555
EndProcedure

CompilerEndIf

Procedure PopCount3(x.q); SSE2 (both x86 and x64)
!movd xmm0, [p.v_x]
!movd xmm1, [p.v_x + 4]
!movq xmm2, [popcount3_v55]
!movq xmm3, [popcount3_v33]
!movq xmm4, [popcount3_v0f]
!punpckldq xmm0, xmm1
!movdqa xmm1, xmm0
!psrlq xmm1, 1
!pand xmm1, xmm2
!psubb xmm0, xmm1
!movdqa xmm1, xmm0
!pand xmm0, xmm3
!psrlq xmm1, 2
!pand xmm1, xmm3
!movdqa xmm1, xmm0
!psrlq xmm1, 4
!pand xmm0, xmm4
!pxor xmm1, xmm1
!movd eax, xmm0
ProcedureReturn
!popcount3_v0f: dq 0x0f0f0f0f0f0f0f0f
!popcount3_v33: dq 0x3333333333333333
!popcount3_v55: dq 0x5555555555555555
EndProcedure

Procedure Popcount4(x.q)
!mov eax, [p.v_x]
!mov ecx, [p.v_x + 4]
!mov edx, ecx
!shr edx, 1
!jz popcount64_l0
!and edx, 0x55555555
!sub ecx, edx
!mov edx, ecx
!and ecx, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, ecx
!shr edx, 4
!and ecx, 0x0f0f0f0f
!popcount64_l0:
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax, edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24
ProcedureReturn
EndProcedure

Procedure PopCount5(i.q)
Protected m.q = \$0101010101010101
i - i >> 1 & \$5555555555555555
i = i & \$3333333333333333 + i >> 2 & \$3333333333333333
i = (i + i >> 4) & \$0f0f0f0f0f0f0f0f
ProcedureReturn (i * m) >> 56
EndProcedure

; ---- Initialisation ----
Define.i i, x, x0, x1, x2, x3, x4, x5, y, rep=10000000
Define.i t0, t1, t2, t3, t4, t5
Define.s s

Dim Rnd.q(rep)
For i = 1 To rep
Rnd(i) = Random(2147483647) << 32 | Random(2147483647)
Next

; ---- Small check whether all procedures Return the same results ----
For i = 1 To 100
x5 = PopCount5(Rnd(i))
x4 = PopCount4(Rnd(i))
x3 = PopCount3(Rnd(i))
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
x2 = PopCount2(Rnd(i))
x1 = PopCount1(Rnd(i))
x0 = PopCount0(Rnd(i))
CompilerElse
x2 = x3
x1 = x2
x0 = x1
CompilerEndIf
If x0 <> x1 Or
x0 <> x2 Or
x0 <> x3 Or
x0 <> x4 Or
x0 <> x5
MessageRequester("Error",
"Different results for PopCount(" + Rnd(i) + ")")
End
EndIf
Next

; ---- Speed test ----
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
t0 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount0(Rnd(i))
Next
t0 = ElapsedMilliseconds() - t0

t1 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount1(Rnd(i))
Next
t1 = ElapsedMilliseconds() - t1

t2 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount2(Rnd(i))
Next
t2 = ElapsedMilliseconds() - t2
CompilerEndIf

t3 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount3(Rnd(i))
Next
t3 = ElapsedMilliseconds() - t3

t4 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount4(Rnd(i))
Next
t4 = ElapsedMilliseconds() - t4

t5 = ElapsedMilliseconds()
For i = 1 To rep
x = PopCount5(Rnd(i))
Next
t5 = ElapsedMilliseconds() - t5

s = "t0 = " + t0 + " ms   (" + Int(100*t0/t4) + "%)" + #LF\$ +
"t1 = " + t1 + " ms   (" + Int(100*t1/t4) + "%)" + #LF\$ +
"t2 = " + t2 + " ms   (" + Int(100*t2/t4) + "%)" + #LF\$ +
"t3 = " + t3 + " ms   (" + Int(100*t3/t4) + "%)" + #LF\$ +
"t4 = " + t4 + " ms   (" + Int(100*t4/t4) + "%)" + #LF\$ +
"t5 = " + t5 + " ms   (" + Int(100*t5/t4) + "%)"
SetClipboardText(s)
MessageRequester("PopCount speed test", s)

Results OSX , PB x86 wrote:
t0 = 0 ms (0%)
t1 = 0 ms (0%)
t2 = 0 ms (0%)
t3 = 56 ms (80%)
t4 = 70 ms (100%)
t5 = 415 ms (592%)

Results OSX , PBx64 wrote:
t0 = 33 ms (46%)
t1 = 54 ms (76%)
t2 = 50 ms (70%)
t3 = 52 ms (73%)
t4 = 71 ms (100%)
t5 = 71 ms (100%)

_________________
macOS 10.15 Catalina, Windows 10

Last edited by wilbert on Mon May 18, 2015 7:30 am, edited 4 times in total.

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 12:49 pm

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3967
Location: Berlin, Germany
@BasicallyPure:
I'm using the Population Count for generating the Thue-Morse sequence.

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups

Top

 Post subject: Re: Population countPosted: Sat May 16, 2015 7:15 pm
 Enthusiast

Joined: Thu Mar 24, 2011 12:40 am
Posts: 536
Location: Iowa, USA
Ahh I see, it's the old Thue-Morse sequence, I should have guessed.

Nicely documented by the way.
Thanks.

_________________
BasicallyPure
Until you know everything you know nothing, all you have is what you believe.

Top

 Post subject: Re: Population countPosted: Sun Jun 14, 2015 6:53 am

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
seems I forgot to post this but it the x86 specific popcountmem functions need checking

Code:

Procedure _Popcount32(x.l)
;x - (x >> 1) &  \$55555555
!mov eax, [p.v_x]
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
;x = (x & \$33333333) + ((x >> 2) & \$33333333)
!mov edx, eax       ;x
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
;x = (x + (x >> 4)) & \$0f0f0f0f0f
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
;x * 0x01010101 >> 24
!imul eax, 0x01010101
!shr eax, 24
ProcedureReturn
EndProcedure

CompilerIf SizeOf(Integer) = 4

Procedure _popcountmem(*mem.long,len.l)
Protected result,a,b

!mov eax, [p.v_len]
!and eax, 3
!mov edx, [p.v_len]
!sub edx,eax
!mov [p.v_a],edx
!mov [p.v_b],eax
!xor ecx,ecx
!lwhile:
!cmp ecx, [p.v_a]
!jge lend
!mov eax, [p.p_mem]
!mov eax, [eax + ecx]
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24
!jmp lwhile
!lend:

!mov eax, [p.p_mem]
!mov eax, [eax + ecx]

!mov edx, [p.v_b]
!jmp dword [JT_remain + edx * 4]
!JT_remain dd le,l1,l2,l3,l4

!l1:
!and eax, 0xff
!jmp le
!l2:
!and eax, 0xffff
!jmp le
!l3:
!and eax, 0xffffff
!jmp le
!l4:
!le:

!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24

ProcedureReturn result
EndProcedure

Procedure _popcountmemSSE(*mem,len)
Protected result,a,b

!mov eax, [p.v_len]
!and eax, 3
!mov edx, [p.v_len]
!sub edx,eax
!mov [p.v_a],edx
!mov [p.v_b],eax
!xor ecx,ecx
!lwhile1:
!cmp ecx, [p.v_a]
!jge lend1

!mov eax, [p.p_mem]
!mov eax, [eax + ecx]
!popcnt eax,eax

!jmp lwhile1
!lend1:

!mov eax, [p.p_mem]
!mov eax, [eax + ecx]

!mov edx, [p.v_b]
!jmp dword [JT_remain1 + edx * 4]
!JT_remain1 dd lle,ll1,ll2,ll3,ll4

!ll1:
!and eax, 0xff
!jmp lle
!ll2:
!and eax, 0xffff
!jmp lle
!ll3:
!and eax, 0xffffff
!jmp lle
!ll4:
!lle:

!mov edx,eax
!popcnt eax,edx

ProcedureReturn result
EndProcedure

CompilerElse

Procedure _Popcount64(x.i)
;x - (x >> 1) &  \$5555555555555555
!mov rax, [p.v_x]
!mov rdx, rax
!shr rdx, 1
!mov r15, 0x5555555555555555
!and rdx, r15
!sub rax,rdx
;x = (x & \$3333333333333333) + ((x >> 2) & \$3333333333333333)
!mov rdx, rax       ;x
!mov r15, 0x3333333333333333
!and rax, r15
!shr rdx, 2
!and rdx, r15
;x = (x + (x >> 4)) & \$0f0f0f0f0f0f0f0f
!mov rdx, rax
!shr rdx, 4
!mov r15, 0x0f0f0f0f0f0f0f0f
!and rax, r15
;x * 0101010101010101 >> 56
!mov r15, 0x0101010101010101
!imul rax, r15
!shr rax, 56
ProcedureReturn
EndProcedure

Procedure _popcountmem(*mem,len)
Protected result,a,b

!mov rax, [p.v_len]
!And rax, 3
!mov rdx, [p.v_len]
!sub rdx,rax
!mov [p.v_a],rdx
!mov [p.v_b],rax
!XOr rcx,rcx
!lwhile:
!cmp rcx, [p.v_a]
!jge lend
!mov rax, [p.p_mem]
!mov eax, [rax + rcx]
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24
!jmp lwhile
!lend:

!mov rax, [p.p_mem]
!mov eax, [rax + rcx]

!lea rdx, [JT_remain]
!mov rcx, [p.v_b]
!jmp qword [rdx + rcx * 8]
!JT_remain dq le,l1,l2,l3,l4

!l1:
!and eax, 0xff
!jmp le
!l2:
!and eax, 0xffff
!jmp le
!l3:
!and eax, 0xffffff
!jmp le
!l4:
!le:

!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24

ProcedureReturn result
EndProcedure

Procedure _popcountmemSSE(*mem,len)
Protected result,a,b

!mov rax, [p.v_len]
!and rax, 7
!mov rdx, [p.v_len]
!sub rdx,rax
!mov [p.v_a],rdx
!mov [p.v_b],rax
!xor rcx,rcx
!lwhile1:
!cmp rcx, [p.v_a]
!jge lend1

!mov rax, [p.p_mem]
!mov rax, [rax + rcx]
!popcnt rax,rax

!jmp lwhile1
!lend1:

!mov rax, [p.p_mem]
!mov rax, [rax + rcx]

!lea rdx, [JT_remain1]
!mov rcx, [p.v_b]
!jmp qword [rdx + rcx * 8]
!JT_remain1 dq lle,ll1,ll2,ll3,ll4,ll5,ll6,ll7,ll8

!ll1:
!and rax, 0xff
!jmp lle
!ll2:
!and rax, 0xffff
!jmp lle
!ll3:
!and rax, 0xffffff
!jmp lle
!ll4:
!mov rdx,0xffffffff
!and rax, rdx
!jmp lle
!ll5:
!mov rdx, 0xffffffffff
!and rax, rdx
!jmp lle
!ll6:
!mov rdx,0xffffffffffff
!and rax, rdx
!jmp lle
!ll7:
!mov rdx,0xffffffffffffff
!and rax, rdx
!jmp lle
!ll8:
!lle:

!mov rdx,rax
!popcnt rax,rdx

ProcedureReturn result
EndProcedure

CompilerEndIf

Procedure _PopCountSSE4(x.i)
CompilerIf SizeOf(Integer) = 8
!popcnt rax,[p.v_x]
CompilerElse
!popcnt eax,[p.v_x]
CompilerEndIf
ProcedureReturn
EndProcedure

Prototype PopCount(x.i)
Prototype PopCountMem(*mem,len)
Global PopCount.PopCount
Global PopCountMem.PopCountMem

Procedure InitpopCount(FallBackMode.i=64)
Protected result
!mov eax, 1
!cpuid
!shr ecx, 23
!and ecx, 1
!mov [p.v_result],ecx
If result
popcount = @_PopCountSSE4()
popcountmem = @_popcountmemSSE()
Else
CompilerIf SizeOf(Integer) = 8
If FallBackMode = 64
popcount = @_popcount64()
popcountmem = @_popcountmem()
Else
popcount = @_popcount32()
popcountmem = @_popcountmem()
EndIf
CompilerElse
popcount = @_popcount32()
popcountmem = @_popcountmem()
CompilerEndIf
EndIf
EndProcedure

CompilerIf #PB_Compiler_IsMainFile

InitpopCount() ; sets popcount to the best supported function
;on x64 if you want to set the mode to the 32 function range, call it with InitpopCount(#PB_Long)

x.i = %10101010101010101010101010101010
Debug PopCount(x)  ;16

mx = Random(1024,128)
*mem = AllocateMemory(mx)
*pa.Ascii = *mem
For a = 0 To mx-1
*pa\a = 3
*pa+1
r+2
Next
Debug r
Debug PopCountMem(*mem,mx)
Debug _popcountmem(*mem,mx)

CompilerEndIf

Top

 Post subject: Re: Population countPosted: Thu Feb 04, 2016 4:10 pm

Joined: Thu Jun 04, 2015 7:10 am
Posts: 1672
great work everyone!:) Ive simply added a CPUID check for SSE4 POPCNT, and then just combined the 32&64bit asm routines posted here along with a Prototype function pointer so it's always pointing at the best function
Code:
Prototype.i protoGetBitCnt(number.i)

Procedure.i HasSSE4_POPCNT()
EnableASM
Protected.i reg_bx
CompilerIf SizeOf(Integer) = 8
!mov [p.v_reg_bx], rbx
CompilerElse
!mov [p.v_reg_bx], ebx
CompilerEndIf
! mov eax, 1
! cpuid
! xor eax, eax
! bt ecx, 23   ;popcnt
! setc al
CompilerIf SizeOf(Integer) = 8
!mov rbx, [p.v_reg_bx]
CompilerElse
!mov ebx, [p.v_reg_bx]
CompilerEndIf
ProcedureReturn
DisableASM
EndProcedure

Procedure GetBitCountSSE4(x.i)
CompilerIf SizeOf(Integer) = 8
!popcnt rax,[p.v_x]
CompilerElse
!popcnt eax,[p.v_x]
CompilerEndIf
ProcedureReturn
EndProcedure

Procedure GetBitCountAsm(x.i)
CompilerIf SizeOf(Integer) = 8
!mov rax, [p.v_x]
!mov rdx, rax
!shr rdx, 1
!mov r15, 0x5555555555555555
!and rdx, r15
!sub rax,rdx
!mov rdx, rax
!mov r15, 0x3333333333333333
!and rax, r15
!shr rdx, 2
!and rdx, r15
!mov rdx, rax
!shr rdx, 4
!mov r15, 0x0f0f0f0f0f0f0f0f
!and rax, r15
!mov r15, 0x0101010101010101
!imul rax, r15
!shr rax, 56
CompilerElse
!mov eax, [p.v_x]
!mov edx, eax
!shr edx, 1
!and edx, 0x55555555
!sub eax,edx
!mov edx, eax
!and eax, 0x33333333
!shr edx, 2
!and edx, 0x33333333
!mov edx, eax
!shr edx, 4
!and eax, 0x0f0f0f0f
!imul eax, 0x01010101
!shr eax, 24
CompilerEndIf
ProcedureReturn
EndProcedure

;Self-initialize
If HasSSE4_POPCNT() = 1
Global GetBitCnt.protoGetBitCnt = @GetBitCountSSE4()
Else
Global GetBitCnt.protoGetBitCnt = @GetBitCountAsm()
EndIf

;---
;from now you can simply call GetBitCnt(value), and it will automatically
;be pointing to either the SSE4 or asm procedure depending on your CPU

ps. i really wish the Thread Subject included the words COUNT and BITS as I nearly missed this thread! (and besides, 32 is more of a party than a population, so yes Intel shouldve named the instruction BITCNT if not PARTYCNT)

_________________
Thankyou to all the coders who generously helped & encouraged me in the nearly 2yrs when i was welcome here,
it was a tremendous privilege. I learned a lot. I wish you and your families all the best and success for the future.

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 2 of 3 [ 36 posts ] Go to page Previous  1, 2, 3  Next

 All times are UTC + 1 hour

#### Who is online

Users browsing this forum: No registered users and 0 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forum

Search for:
 Jump to:  Select a forum ------------------ PureBasic    Coding Questions    Game Programming    3D Programming    Assembly Programming    The PureBasic Editor    The PureBasic Form Designer    General Discussion    Feature Requests and Wishlists    Tricks 'n' Tips Bug Reports    Bugs - Windows    Bugs - Linux    Bugs - Mac OSX    Bugs - IDE    Bugs - Documentation OS Specific    AmigaOS    Linux    Windows    Mac OSX Miscellaneous    Announcement    Off Topic Showcase    Applications - Feedback and Discussion    PureFORM & JaPBe    TailBite