Comparisons in mathematical simulations

Just starting out? Need help? Post your questions and find answers here.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Comparisons in mathematical simulations

Post by wilbert »

fvillanova wrote: Tue May 16, 2023 4:58 am I appreciate any contribution to speeding up the CreateS() procedure.

Code: Select all

Procedure.s CreateS(*input)
  Protected Dim Result.u(100)
  FillMemory(@Result(), 200, '0', #PB_Unicode)
  !mov r9, [p.a_Result]
  !xor rax, rax
  !xor edx, edx
  !mov r8, [p.p_input]
  !.loop:
  !mov ecx, [r8]
  !add r8, 6
  !and ecx, 0xf000f
  !imul ecx, 0xa0001
  !shr ecx, 16
  !neg rcx
  !mov byte [r9 + rcx*2 + 198], '1'
  !cmp [r8 - 2], dx
  !jne .loop
  ProcedureReturn PeekS(@result())
EndProcedure

s1.s="92 00 99 80 65 68 04 01"
s2.s="37 65 04 92 15 98 00 43 02"
Debug CreateS(@s1)
Debug CreateS(@s2)

Grep using a 128 bit structure ...

Code: Select all

Structure I128
  low.q
  high.q
EndStructure

Procedure CreateB_I128(*input, *result.I128)
  !xor eax, eax
  !mov r8, [p.p_input]    ; pointer to input string
  !mov r9, [p.p_result]   ; pointer to i128 result
  !xor r10, r10           ; result bits low  [00-63]
  !xor r11, r11           ; result bits high [64-99]
  !.l0:
  !mov ecx, [r8]          ; get number
  !add r8, 6
  !and ecx, 0xf000f
  !imul ecx, 0xa0001
  !shr ecx, 16
  !test ecx, 64           ; >= 64 ?
  !jnz .l1
  !bts r10, rcx           ; set bit in low
  !jmp .l2
  !.l1:
  !bts r11, rcx           ; set bit in high
  !.l2:
  !cmp [r8 - 2], ax       ; check for end of input string
  !jne .l0                ; loop if not end of string
  !mov [r9], r10          ; set result bits low  [00-63]
  !mov [r9 + 8], r11      ; set result bits high [64-99]
EndProcedure

Procedure.i PopCount_I128(*input.I128)
  !mov r8, [p.p_input]
  !popcnt rax, [r8]       ; popcnt bits [00-63]
  !popcnt rdx, [r8 + 8]   ; popcnt bits [64-99]
  !add rax, rdx           ; add
  ProcedureReturn  
EndProcedure

Procedure.i Grep_I128(*input1.I128, *input2.I128)
  !mov r8, [p.p_input1]
  !mov r9, [p.p_input2]
  !mov rax, [r8]          ; get low bits of input1
  !and rax, [r9]          ; and with low bits of input2
  !popcnt rax, rax        ; popcnt
  !mov rdx, [r8 + 8]      ; get high bits of input1
  !and rdx, [r9 + 8]      ; and with high bits of input2
  !popcnt rdx, rdx        ; popcnt
  !add rax, rdx           ; add
  ProcedureReturn  
EndProcedure

s1.s="07 18 15 61 02 10 08 03 00 99 97"
s2.s="01 17 61 07 04 05 18 06 57 99"
s3.s="01 07"

CreateB_I128(@s1, @x.I128)
CreateB_I128(@s2, @y.I128)
CreateB_I128(@s3, @z.I128)

Debug "PopCount"
Debug PopCount_I128(@x)
Debug PopCount_I128(@y)
Debug PopCount_I128(@z)

Debug "Grep"
Debug Grep_I128(@x, @y)
Debug Grep_I128(@x, @z)
Debug Grep_I128(@y, @z)

Using popcnt is very useful if you convert the "01 99 50" strings once and use bits internally.
If you are always using those "01 99 50" strings as input for grep it probably is faster to do it differently.
Windows (x64)
Raspberry Pi OS (Arm64)
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Tue May 16, 2023 8:10 am
fvillanova wrote: Tue May 16, 2023 4:58 am I appreciate any contribution to speeding up the CreateS() procedure.

Code: Select all

Procedure.s CreateS(*input)
  Protected Dim Result.u(100)
  FillMemory(@Result(), 200, '0', #PB_Unicode)
  !mov r9, [p.a_Result]
  !xor rax, rax
  !xor edx, edx
  !mov r8, [p.p_input]
  !.loop:
  !mov ecx, [r8]
  !add r8, 6
  !and ecx, 0xf000f
  !imul ecx, 0xa0001
  !shr ecx, 16
  !neg rcx
  !mov byte [r9 + rcx*2 + 198], '1'
  !cmp [r8 - 2], dx
  !jne .loop
  ProcedureReturn PeekS(@result())
EndProcedure

s1.s="92 00 99 80 65 68 04 01"
s2.s="37 65 04 92 15 98 00 43 02"
Debug CreateS(@s1)
Debug CreateS(@s2)
Amazing, it was very fast to create String 99 to 0
Thank you again!
My program already stores these Strings in database records that are later processed with the PopCount procedure with unbeatable speed in comparisons.
I have routines that need to process trillions of iterations (loops).
CreateS test results:

My initial CreateS = 457 ms
Demivec's CreateS = 58 ms
Wilbert's CreateS = 7 ms

Then I'll see your idea of CreateB_I128 and PopCount_I128 but at the moment creating keys previously with CreateS and processing
with PopCount the speed is unbeatable:

Code: Select all

DisableDebugger
Procedure PopCount(inp.i)
  !POPCNT rax,[p.v_inp]  
  ProcedureReturn
EndProcedure

s11.s="1000000100000000000100000000000100100000000000000000000000000000000000000000000000000000000000010011" ; "92 00 99 80 65 68 04 01"
s22.s="0100000100000000000000000000000000100000000000000000000010000010000000000000000000001000000000010101" ; "37 65 04 92 15 98 00 43 02"

part1.s=Left(s11,50)
part2.s=Right(s11,50)
part3.s=Left(s22,50)
part4.s=Right(s22,50)

x1.i=Val("%"+part1)
x2.i=Val("%"+part2)
y1.i=Val("%"+part3)
y2.i=Val("%"+part4)

t1 = ElapsedMilliseconds()
For i=1 To 40000000
  n=PopCount(x1 & y1)+PopCount(x2 & y2)
Next
Result$+"Count "+Str(n)+" in "+StrF((ElapsedMilliseconds()-t1),0)+" ms ( For 400000000 four hundred million times ) "+Chr(13)

EnableDebugger
Debug Result$
I believe my idea of splitting the string into two parts is 8 times faster than your solution with procedure Grep_I128

n=PopCount(x1 & y1)+PopCount(x2 & y2) = 109 milliseconds
n=Grep_I128(@x,@y) = 890 milliseconds

Many thanks again for the new procedure CountS().
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Comparisons in mathematical simulations

Post by wilbert »

fvillanova wrote: Tue May 16, 2023 7:39 pmCreateS test results:

My initial CreateS = 457 ms
Demivec's CreateS = 58 ms
Wilbert's CreateS = 7 ms
Good to hear that it works for you. :)

PopCount itself is fast.
It's the splitting the string in two and converting to a number with Left / Right and Val that takes up most of the time.
Shouldn't that also be in the loop when you time things or is this conversion a one time only operation ?
Windows (x64)
Raspberry Pi OS (Arm64)
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Tue May 16, 2023 9:17 pm
fvillanova wrote: Tue May 16, 2023 7:39 pmCreateS test results:
PopCount itself is fast.
It's the splitting the string in two and converting to a number with Left / Right and Val that takes up most of the time.
Shouldn't that also be in the loop when you time things or is this conversion a one time only operation ?
Yes, dividing by two Strings and converting left and right is already recorded in the database, it is not processed every time, just once.
The heaviest processing during simulations is done with the two conversion numbers that were previously calculated.
Thanks a lot for the help.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Comparisons in mathematical simulations

Post by wilbert »

Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.

Code: Select all

DisableDebugger

; PopCount128

Procedure PopCount128_Addr()
  ProcedureReturn ?Addr
  !align 8
  Addr:
  CompilerIf #PB_Compiler_OS = #PB_OS_Windows
    !popcnt rax, rcx      ; x64 Windows
    !popcnt r8 , rdx
  CompilerElse
    !popcnt rax, rdi      ; x64 macOS and Linux
    !popcnt r8 , rsi
  CompilerEndIf
  !add rax, r8
  !ret
EndProcedure

PrototypeC PopCount128_Proto(q1.q, q2.q)

Global PopCount128.PopCount128_Proto = PopCount128_Addr()

; PopCount

Procedure PopCount(inp.i)
  !popcnt rax, [p.v_inp]
  ProcedureReturn
EndProcedure

; Compare

x1 = 5
x2 = 15
y1 = 255
y2 = 127

t1 = ElapsedMilliseconds()
For i=1 To 100000000
  n=PopCount(x1 & y1)+PopCount(x2 & y2)
Next
t2 = ElapsedMilliseconds()
For i=1 To 100000000
  n=PopCount128(x1 & y1, x2 & y2)
Next
t3 = ElapsedMilliseconds()
For i=1 To 100000000
  ; inline asm
  !mov rax, [v_x1]
  !mov rdx, [v_x2]
  !and rax, [v_y1]
  !and rdx, [v_y2]
  !popcnt rax, rax
  !popcnt rdx, rdx
  !add rax, rdx
  !mov [v_n], rax
Next
t4 = ElapsedMilliseconds()

EnableDebugger

Debug "PopCount (2x) : " + Str(t2-t1)
Debug "PopCount128 : " + Str(t3-t2)
Debug "inline popcnt : " + Str(t4-t3)
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Piero
Enthusiast
Enthusiast
Posts: 285
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: Comparisons in mathematical simulations

Post by Piero »

I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Fri May 19, 2023 6:48 am Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.
Hi Wilbert,
Now this option "inline" will be the best choice.
I'll let you know later how it worked on my system.
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Comparisons in mathematical simulations

Post by Olli »

Piero wrote: Fri May 19, 2023 10:50 am I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
Hello Piero,

do not hesitate to create a whole subject, first in the coding section, about the Kepler's conjecture (you put this expression in the title)

There will be any chances to get surprising answers. And after a time, you will be able to write a small project, or more in the announcement section.
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Fri May 19, 2023 6:48 am Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.
Amazing, the speed got bigger, I thought it would be impossible to increase!
In some places in the main program the input variables are simple, but in other places they are arrays.
Would it be faster if the "inline" routine took the values directly from the array without having to first pass it to a simple variable?

Code: Select all

  
  i4q=matf2(i5)  ; i5 = integer index
  i8q=matf(i6)   ; i6 = integer index
  !mov rax, [v_i4q]
  !and rax, [v_i8q]
  !popcnt rax, rax
  !mov [v_n], rax
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

Piero wrote: Fri May 19, 2023 10:50 am I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
Yes, mathematical problems like this can definitely be solved with quantum computing that is coming in a few years (I hope) that it will be within our reach!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Comparisons in mathematical simulations

Post by wilbert »

fvillanova wrote: Sat May 20, 2023 1:01 am Amazing, the speed got bigger, I thought it would be impossible to increase!
In some places in the main program the input variables are simple, but in other places they are arrays.
Would it be faster if the "inline" routine took the values directly from the array without having to first pass it to a simple variable?
Yes, that should be faster.
You can see for yourself

Test 1

Code: Select all

DisableDebugger

Dim matf.q(255)
Dim matf2.q(255)

i5 = 31
i6 = 151
matf(i6) = 65535
matf2(i5) = 247
n = 0

; Compare

t1 = ElapsedMilliseconds()
For i=1 To 100000000
  i4q=matf2(i5)  ; i5 = integer index
  i8q=matf(i6)   ; i6 = integer index
  !mov rax, [v_i4q]
  !and rax, [v_i8q]
  !popcnt rax, rax
  !mov [v_n], rax
Next
t2 = ElapsedMilliseconds()
For i=1 To 100000000
  ; inline asm
  !mov r8, [v_i5]
  !mov r9, [v_i6]
  !shl r8, 3
  !shl r9, 3
  !add r8, [a_matf2]
  !add r9, [a_matf]
  !mov rax, [r8]
  !and rax, [r9]
  !popcnt rax, rax
  !mov [v_n], rax
Next
t3 = ElapsedMilliseconds()

EnableDebugger

Debug "With i4q/i8q variables : " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(t3-t2)

Test 2

Code: Select all

DisableDebugger

Dim matf.q(8191)
Dim matf2.q(8191)
RandomData(@matf(0), 8192*8)
RandomData(@matf2(0), 8192*8)

n = 0

; Compare

t1 = ElapsedMilliseconds()
For i5 = 0 To 8191
  For i6 = 0 To 8191
    i4q=matf2(i5)  ; i5 = integer index
    i8q=matf(i6)   ; i6 = integer index
    !mov rax, [v_i4q]
    !and rax, [v_i8q]
    !popcnt rax, rax
    !mov [v_n], rax
  Next
Next
t2 = ElapsedMilliseconds()
For i5 = 0 To 8191
  For i6 = 0 To 8191
    ; inline asm
    !mov r8, [v_i5]
    !mov r9, [v_i6]
    !shl r8, 3
    !shl r9, 3
    !add r8, [a_matf2]
    !add r9, [a_matf]
    !mov rax, [r8]
    !and rax, [r9]
    !popcnt rax, rax
    !mov [v_n], rax
  Next
Next
t3 = ElapsedMilliseconds()

EnableDebugger

Debug "With i4q/i8q variables : " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(t3-t2)
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Comparisons in mathematical simulations

Post by idle »

Code: Select all


DisableDebugger

Procedure popcountmem(*mem,*mem1,len) 
  Protected result,a,b 
       
  !mov rax, [p.v_len] 
  !and rax, 7 
  !mov rdx, [p.v_len] 
  !sub rdx,rax 
  !mov [p.v_a],rdx
  !mov [p.v_b],rax 
  !xor rcx,rcx
  !lwhile1:
  !cmp rcx, [p.v_a] 
  !jge lend1
  
  !mov rax, [p.p_mem] 
  !mov rax, [rax + rcx]
  !mov r9, [p.p_mem1] 
  !mov r9, [r9 + rcx]
  !and rax,r9 
  !popcnt rax,rax   
  !add [p.v_result],rax   
  !add rcx,8
   
  !jmp lwhile1
  !lend1:
  
  !mov rax, [p.p_mem] 
  !mov rax, [rax + rcx]
  !mov r9, [p.p_mem1] 
  !mov r9, [r9 + rcx]
  !and rax,r9 
  !lea rdx, [JT_remain1]
  !mov rcx, [p.v_b]
  !jmp qword [rdx + rcx * 8]
  !JT_remain1 dq lle,ll1,ll2,ll3,ll4,ll5,ll6,ll7,ll8
    
  !ll1: 
  !and rax, 0xff
  !jmp lle
  !ll2:
  !and rax, 0xffff
  !jmp lle 
  !ll3:
  !and rax, 0xffffff
  !jmp lle 
  !ll4:
  !mov rdx,0xffffffff
  !and rax, rdx
  !jmp lle 
  !ll5:
  !mov rdx, 0xffffffffff
  !and rax, rdx
  !jmp lle 
  !ll6:
  !mov rdx,0xffffffffffff
  !and rax, rdx
  !jmp lle
  !ll7:
  !mov rdx,0xffffffffffffff 
  !and rax, rdx
  !jmp lle
  !ll8:
  !lle: 
  
  !mov rdx,rax
  !popcnt rax,rdx   
  !add [p.v_result],rax   
  
  ProcedureReturn result  
EndProcedure   


Dim matf.q(8191)
Dim matf2.q(8191)

RandomData(@matf(0), 8192*8)
RandomData(@matf2(0), 8192*8)

n = 0
n1 =0
n2=0
lp=8000 
; Compare

t1 = ElapsedMilliseconds()
For a = 0 To lp
  n=0
  For i5 = 0 To 8191
    
    i4q=matf2(i5)  ; i5 = integer index
    i8q=matf(i5)   ; i6 = integer index
    !mov rax, [v_i4q]
    !and rax, [v_i8q]
    !popcnt rax, rax
    !add [v_n], rax
    
  Next
Next   
t2 = ElapsedMilliseconds()
For a = 0 To lp 
  n1=0
  For i5 = 0 To 8191
    
    ; inline asm
    !mov r8, [v_i5]
    !mov r9, [v_i5]
    !shl r8, 3
    !shl r9, 3
    !add r8, [a_matf2]
    !add r9, [a_matf]
    !mov rax, [r8]
    !and rax, [r9]
    !popcnt rax, rax
    !add [v_n1], rax
    
  Next
Next 
t3 = ElapsedMilliseconds()

For a = 0 To lp 
  n2=0
  n2 = popcountmem(@matf(0),@matf2(0),8192*8) 
Next 

t4 = ElapsedMilliseconds()

EnableDebugger

Debug "With i4q/i8q variables : " +  Str(n) + " " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(n1) + " " +  Str(t3-t2)
Debug "Raw : " + Str(n2) + " " + Str(t4-t3)
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Sat May 20, 2023 6:21 am Yes, that should be faster.
You can see for yourself
Yes, it got even faster! I still can't believe it!
Several processings were done in +- 8 to 9 minutes and now they are being processed in 3.3 minutes!!!
I am not experienced in ASM but I can make small changes and simple routines.
See how the main part is in the processing:

Code: Select all

 i4q=CreateB(@aux2)
 FillMemory(@conte(), (mate(i0,14)+1)*8, 0, #PB_Integer)
 i6=mate(i0,10): kk2=mate(i0,11)-i6+1  
 !mov r10,[a_conte]
 !mov rcx,1
 !MOV   r13, qword [v_kk2] 
 !l_jump12:
 !mov r9, [v_i6]
 !shl r9, 3
 !add r9, [a_matf]
 !mov rax, [v_i4q]
 !and rax, [r9]
 !popcnt rax, rax
 !add [r10 + rax * 8],rcx
 !add [v_i6], 1  
 !dec r13
 !jnz l_jump12 
It's working great! and very... very... very fast!
note: The above code will not work because there are external variables and arrays that are not present here.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Comparisons in mathematical simulations

Post by wilbert »

fvillanova wrote: Sun May 21, 2023 1:13 am It's working great! and very... very... very fast!
note: The above code will not work because there are external variables and arrays that are not present here.
I think there's still some room for improvement on your code.
You don't have to update the i6 variable al the time. You can do it once after the loop has finished.
The code below should accomplish the same as your code.

Code: Select all

 i4q=CreateB(@aux2)
 FillMemory(@conte(), (mate(i0,14)+1)*8, 0, #PB_Integer)
 i6=mate(i0,10): kk2=mate(i0,11)-i6+1
 !mov r9, [v_i6]                  ; r9 = @matf() + i6 * 8
 !shl r9, 3 
 !add r9, [a_matf]
 !mov r10, [a_conte]              ; r10 = @conte()
 !mov rcx, [v_kk2]                ; rcx = kk2
 !l_jump12:
 !mov rax, [v_i4q]
 !and rax, [r9]
 !popcnt rax, rax
 !add qword [r10 + rax * 8], 1
 !add r9, 8 
 !sub rcx, 1
 !jnz l_jump12
 i6+kk2
Windows (x64)
Raspberry Pi OS (Arm64)
fvillanova
User
User
Posts: 70
Joined: Wed May 02, 2012 2:17 am
Location: Brazil

Re: Comparisons in mathematical simulations

Post by fvillanova »

wilbert wrote: Sun May 21, 2023 6:30 am I think there's still some room for improvement on your code.
You don't have to update the i6 variable al the time. You can do it once after the loop has finished.
The code below should accomplish the same as your code.
Unbelievable! you are the man!
Really, i6 was no longer necessary to be incremented, it was needed in the previous routines, and I didn't see it.
The other speedup you did, doubled the speed, wow!
This is the "Holy Grail" for my routine.
There are some details that I didn't show you, because it is used in 8 places in the system, including where it uses the four variables to be able to make PopCount with more than 64 (0 or 1)
Now, I put it in this segment of the program that consumes more processing because the mate(x,y) array have a large variation of values, later I will replace those in the other seven places.
Thanks again for your help.
Post Reply