Page 1 of 2
Cube root
Posted: Sun Oct 02, 2011 11:10 am
by Psychophanta
From
http://www.asmcommunity.net/board/index ... ic=14769.0
Code: Select all
Macro Cbrt(num)
!jmp @f
!.dat:dq 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333
!@@:fld qword[.dat]
!fld qword[v_#num#]
!fabs
!fyl2x ;->log2(Src1)*exponent
!fld st0 ;copy the logarithm
!frndint ;keep only the characteristic
!fsub st1,st0 ;keeps only the mantissa
!fxch ;get the mantissa on top
!f2xm1 ;->2^(mantissa)-1
!fld1
!faddp ;add 1 back
!fscale ;scale it with the characteristic
!fstp st1 ;copy result over and "pop" it
!test byte[v_#num#+7],$80
!jz @f
!fchs ;restore sign
!@@:fstp qword[v_#num#]
EndMacro
f.d=-26.4433
cbrt(f)
Debug f
Something better please?
Re: Cube root
Posted: Mon Feb 27, 2012 3:22 pm
by evilgravedigger
Code: Select all
Global T.f = 19.0
Global A.f
!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0
!movq qword[v_A],xmm0
Debug A
requires SSE.
Re: Cube root
Posted: Mon Feb 27, 2012 10:36 pm
by Psychophanta
evilgravedigger wrote:Code: Select all
Global T.f = 19.0
Global A.f
!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0
!movq qword[v_A],xmm0
Debug A
requires SSE.
sorry, it does not work. Teh result value is wrong.
Besides it does not work for negative values.
Re: Cube root
Posted: Mon Feb 27, 2012 11:49 pm
by xorc1zt
evilgravedigger wrote:Code: Select all
Global T.f = 19.0
Global A.f
!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0
!movq qword[v_A],xmm0
Debug A
requires SSE.
sqrtss return the square root not the cube one.
sse cuberoot :
http://www.musicdsp.org/showone.php?id=206
Re: Cube root
Posted: Tue Feb 28, 2012 9:38 am
by Psychophanta
Another good lesson to realize about Intel x86 line is a fully crap.
Re: Cube root
Posted: Tue Feb 28, 2012 8:39 pm
by evilgravedigger
xorc1zt wrote:evilgravedigger wrote:Code: Select all
Global T.f = 19.0
Global A.f
!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0
!movq qword[v_A],xmm0
Debug A
requires SSE.
sqrtss return the square root not the cube one.
sse cuberoot :
http://www.musicdsp.org/showone.php?id=206
Ow, sorry, my bad.
Re: Cube root
Posted: Thu Dec 12, 2013 12:05 pm
by Psychophanta
Mmmm!!
Still waiting for a faster and bit-wider (64 bit float values) function than the first one posted here (you can use SSE, SSE2, 3, 4, 5, 6 .... )
Re: Cube root
Posted: Sat Dec 14, 2013 12:24 pm
by Helle
Ha, no problem
!
Code: Select all
;Cubic Root, Single Precision
;PB 5.21 LTS (x64)
;I use Heron: Loop {x = x * (((x * x * x) + 2 * a) / ((2 * x * x * x) + a))}
Test_Value.f = -12345.6789 ;Cubic_Root = -23,112042408247961097779983746659
Appr.f ;for all...
Procedure.f Cube_Root()
!mov eax,[v_Test_Value]
!and eax,7FFFFFFFh ;approximation without sign
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!mov [v_Appr],eax
!movss xmm0,[v_Appr]
!movss xmm1,xmm0
!lea rdx,[v_Appr]
!mov eax,[rdx]
!xor edx,edx
!div dword[Value3]
!add eax,[BitHack]
!mov [v_Appr],eax
!movss xmm2,[v_Appr]
!addss xmm1,xmm1 ;xmm1=2*a=constant
!mov ecx,2 ;higher=more precision (if possible)
!@@:
!movss xmm3,xmm2
!mulss xmm3,xmm3
!mulss xmm3,xmm2 ;xmm3=x*x*x
!movss xmm4,xmm3
!addss xmm4,xmm4 ;2*x*x*x
!addss xmm4,xmm0 ;xmm0=a
!addss xmm3,xmm1
!divss xmm3,xmm4
!mulss xmm2,xmm3
!dec ecx
!jnz @b
;set sign (if Test_Value negativ)
!test byte[v_Test_Value+3],80h
!jz @f
!mulss xmm2,[Minus1] ;restore sign
!@@:
!movss dword[v_Appr],xmm2
!fld dword[v_Appr]
ProcedureReturn
!Minus1: dd -1.0
!BitHack: dd 709921077 ;for Kahan´s bit hack; http://metamerist.com/cbrt/cbrt.htm
!Value3: dd 3
EndProcedure
TA = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt.f = Cube_Root()
Next
TE = ElapsedMilliseconds() - TA
MessageRequester("Cubic Root (Single-Precision) with SSE", StrF(Cbrt, 7) + #LFCR$ + Str(TE) + " ms for 10000000 Loops")
Code: Select all
;Cubic Root, Double Precision
;PB 5.21 LTS (x64)
;I use Heron: Loop {x = x * (((x * x * x) + 2 * a) / ((2 * x * x * x) + a))}
Test_Value.d = -12345.6789 ;Cubic_Root = -23,112042408247961097779983746659
Appr.d ;for all...
Procedure.d Cube_Root()
!mov rax,[v_Test_Value]
!mov rdx,7FFFFFFFFFFFFFFFh
!and rax,rdx ;approximation without sign
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!mov [v_Appr],rax
!movsd xmm0,[v_Appr]
!movsd xmm1,xmm0
!lea rdx,[v_Appr]
!mov rax,[rdx]
!xor rdx,rdx
!div qword[Value3]
!add rax,[BitHack]
!mov [v_Appr],rax
!movsd xmm2,[v_Appr]
!addsd xmm1,xmm1 ;xmm1=2*a=constant
!mov ecx,2 ;higher=more precision (if possible)
!@@:
!movsd xmm3,xmm2
!mulsd xmm3,xmm3
!mulsd xmm3,xmm2 ;xmm3=x*x*x
!movsd xmm4,xmm3
!addsd xmm4,xmm4 ;2*x*x*x
!addsd xmm4,xmm0 ;xmm0=a
!addsd xmm3,xmm1
!divsd xmm3,xmm4
!mulsd xmm2,xmm3
!dec ecx
!jnz @b
;set sign (if Test_Value negativ)
!test byte[v_Test_Value+7],80h
!jz @f
!mulsd xmm2,[Minus1] ;restore sign
!@@:
!movsd qword[v_Appr],xmm2
!fld qword[v_Appr]
ProcedureReturn
!Minus1: dq -1.0
!BitHack: dq 3071306043645493248 ;715094163<<32, for Kahan´s bit hack; http://metamerist.com/cbrt/cbrt.htm
!Value3: dq 3
EndProcedure
TA = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt.d = Cube_Root()
Next
TE = ElapsedMilliseconds() - TA
MessageRequester("Cubic Root (Double-Precision) with SSE2", StrD(Cbrt, 15) + #LFCR$ + Str(TE) + " ms for 10000000 Loops")
Make your tests and have fun!
Helle
Re: Cube root
Posted: Sat Dec 14, 2013 1:38 pm
by Psychophanta
OooppS!
The speed of cube root results cricical needed for x86 32 bit systems.
x86 running at 64 bit are quite strange.
Sorry, can not test as I use 32bit.
However, watching at your code, it still seems slower (Only the part of your function before the first label '!@@:' is much slower than the entire function in my firts post, and still more for the AMD processors
).
Have you tested it seriously?
Thanks!
Re: Cube root
Posted: Sat Dec 14, 2013 3:52 pm
by Helle
I made tests with PB 64-Bit (Cubic Root -12345.6789, 10000000 Loops, Procedure):
Psychophanta:
- Float: -23.1120453 / 484 ms
- Double : -23.112042408247959 / 491 ms
Helle:
- Float: -23.1120415 / 169 ms
- Double : -23.112042408247962 / 203 ms
Edit: CPU = Intel i7-4770K@4.0GHz
Re: Cube root
Posted: Sat Dec 14, 2013 5:46 pm
by Psychophanta
Helle wrote:I made tests with PB 64-Bit (Cubic Root -12345.6789, 10000000 Loops, Procedure):
Psychophanta:
- Float: -23.1120453 / 484 ms
- Double : -23.112042408247959 / 491 ms
Helle:
- Float: -23.1120415 / 169 ms
- Double : -23.112042408247962 / 203 ms
Edit: CPU = Intel i7-4770K@4.0GHz
Strange, very strange
Psychophanta:
- Float: -23.1120453 / 343 ms
- Double : -23.112042408247959 / 375 ms
Edit: CPU = AMD Phenom II 550, 3.0GHz (purchased in year 2009, no overcloked, no overcored)
Re: Cube root
Posted: Sun Dec 15, 2013 6:19 pm
by Helle
Test-Code for x86 (e.g. with other CPU´s):
Code: Select all
;For 32-Bit-Windows
;PB 5.21 LTS (x86)
Test_Value_D.d = -12345.6789 ;Cube Root = -23,112042408247961097779983746659
Appr_D.d ;for all...
Test_Value_F.f = -12345.6789
Appr_F.f
Procedure.f Cube_Root_F_H_x86() ;only for x86
!mov eax,[v_Test_Value_F]
!and eax,7FFFFFFFh ;without sign
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!mov [v_Appr_F],eax
!movss xmm0,[v_Appr_F]
!movss xmm1,xmm0
!lea edx,[v_Appr_F]
!mov eax,[edx]
!xor edx,edx
!div dword[Value3_F]
!add eax,[BitHack_F]
!mov [v_Appr_F],eax
!movss xmm2,[v_Appr_F]
!addss xmm1,xmm1 ;xmm1=2*a=constant
!mov ecx,2 ;higher=more precision (if possible)
!@@:
!movss xmm3,xmm2
!mulss xmm3,xmm3
!mulss xmm3,xmm2 ;xmm3=x*x*x
!movss xmm4,xmm3
!addss xmm4,xmm4 ;2*x*x*x
!addss xmm4,xmm0 ;xmm0=a
!addss xmm3,xmm1
!divss xmm3,xmm4
!mulss xmm2,xmm3
!dec ecx
!jnz @b
;set sign (if Test_Value negativ)
!test byte[v_Test_Value_F+3],80h
!jz @f
!mulss xmm2,[Minus1_F] ;restore sign
!@@:
!movss dword[v_Appr_F],xmm2
!fld dword[v_Appr_F]
ProcedureReturn
!Minus1_F: dd -1.0
!BitHack_F: dd 709921077 ;for Kahan´s bit hack (Float); http://metamerist.com/cbrt/cbrt.htm
!Value3_F: dd 3
EndProcedure
Procedure.d Cube_Root_D_H_x86() ;only for x86
!mov eax,dword[v_Test_Value_D+4]
!mov edx,dword[v_Test_Value_D]
!mov dword[v_Appr_D],edx
!and eax,7FFFFFFFh ;without sign
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!mov dword[v_Appr_D+4],eax
!movsd xmm0,[v_Appr_D]
!movsd xmm1,xmm0
!lea edx,[v_Appr_D]
!mov eax,[edx+4]
!xor edx,edx
!div dword[Value3_D]
!add eax,[BitHack_D+4]
!mov dword[v_Appr_D+4],eax
!movsd xmm2,[v_Appr_D]
!addsd xmm1,xmm1 ;xmm1=2*a=constant
!mov ecx,2 ;higher=more precision (if possible)
!@@:
!movsd xmm3,xmm2
!mulsd xmm3,xmm3
!mulsd xmm3,xmm2 ;xmm3=x*x*x
!movsd xmm4,xmm3
!addsd xmm4,xmm4 ;2*x*x*x
!addsd xmm4,xmm0 ;xmm0=a
!addsd xmm3,xmm1
!divsd xmm3,xmm4
!mulsd xmm2,xmm3
!dec ecx
!jnz @b
;set sign (if Test_Value negativ)
!test byte[v_Test_Value_D+7],80h
!jz @f
!mulsd xmm2,[Minus1_D] ;restore sign
!@@:
!movsd qword[v_Appr_D],xmm2
!fld qword[v_Appr_D]
ProcedureReturn
!Minus1_D: dq -1.0
!BitHack_D: dq 3071306043645493248 ;715094163<<32, for Kahan´s bit hack (Double); http://metamerist.com/cbrt/cbrt.htm
!Value3_D: dd 3
EndProcedure
Procedure.f Cube_Root_F_P() ;for x86 and x64
!fld dword[Dat_F]
!fld dword[v_Test_Value_F]
!fabs
!fyl2x ;->log2(Src1)*exponent
!fld st0 ;copy the logarithm
!frndint ;keep only the characteristic
!fsub st1,st0 ;keeps only the mantissa
!fxch ;get the mantissa on top
!f2xm1 ;->2^(mantissa)-1
!fld1
!faddp ;add 1 back
!fscale ;scale it with the characteristic
!fstp st1 ;copy result over and "pop" it
!test byte[v_Test_Value_F+3],$80
!jz @f
!fchs ;restore sign
!@@:
ProcedureReturn
!Dat_F: dd 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 ;:-)
EndProcedure
Procedure.d Cube_Root_D_P() ;for x86 and x64
!fld qword[Dat_D]
!fld qword[v_Test_Value_D]
!fabs
!fyl2x ;->log2(Src1)*exponent
!fld st0 ;copy the logarithm
!frndint ;keep only the characteristic
!fsub st1,st0 ;keeps only the mantissa
!fxch ;get the mantissa on top
!f2xm1 ;->2^(mantissa)-1
!fld1
!faddp ;add 1 back
!fscale ;scale it with the characteristic
!fstp st1 ;copy result over and "pop" it
!test byte[v_Test_Value_D+7],$80
!jz @f
!fchs ;restore sign
!@@:
ProcedureReturn
!Dat_D: dq 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 ;:-)
EndProcedure
TA_F_H = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt_F_H_x86.f = Cube_Root_F_H_x86()
Next
TE_F_H = ElapsedMilliseconds() - TA_F_H
Result$ = "Test-Value : -12345.6789" + #LFCR$ + "Float Helle:" + #LFCR$ + StrF(Cbrt_F_H_x86, 7) + " / " + Str(TE_F_H) + " ms " + #LFCR$
TA_D_H = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt_D_H_x86.d = Cube_Root_D_H_x86()
Next
TE_D_H = ElapsedMilliseconds() - TA_D_H
Result$ + "Double Helle:" + #LFCR$ + StrD(Cbrt_D_H_x86, 15) + " / " + Str(TE_D_H) + " ms " + #LFCR$
TA_F_P = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt_F_P.f = Cube_Root_F_P()
Next
TE_F_P = ElapsedMilliseconds() - TA_F_P
Result$ + "Float Psychophanta:" + #LFCR$ + StrF(Cbrt_F_P, 7) + " / " + Str(TE_F_P) + " ms " + #LFCR$
TA_D_P = ElapsedMilliseconds()
For i = 1 To 10000000
Cbrt_D_P.d = Cube_Root_D_P()
Next
TE_D_P = ElapsedMilliseconds() - TA_D_P
Result$ + "Double Psychophanta:" + #LFCR$ + StrD(Cbrt_D_P, 15) + " / " + Str(TE_D_P) + " ms " + #LFCR$ + #LFCR$
Result$ + "CPU: Intel i7-4770K@4.0GHz" + #LFCR$
;SetClipboardText(Result$)
MessageRequester("Cube Root x86, 10000000 Loops", Result$)
My results (with VirtualBox under Win7 x64):
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 191 ms
Double Helle:
-23.112042408247962 / 362 ms; I have for Double_x86 too many overhead
Float Psychophanta:
-23.1120453 / 481 ms
Double Psychophanta:
-23.112042408247959 / 488 ms
CPU: Intel i7-4770K@4.0GHz
Re: Cube root
Posted: Mon Dec 16, 2013 12:13 am
by Psychophanta
Ok, the function you posted seems faster for intel i7 and x64 OS,
There would be good if someone could test it in an AMD.
Re: Cube root
Posted: Mon Dec 16, 2013 7:43 am
by wilbert
Psychophanta wrote:Ok, the function you posted seems faster for intel i7 and x64 OS,
There would be good if someone could test it in an AMD.
Sorry, no AMD here.
Here's a small modification of the versions Helle posted.
Does that make a difference on AMD ?
Code: Select all
Procedure.f Cube_Root_F_H_x86() ;only for x86
!mov eax,[v_Test_Value_F]
!mov ecx, eax
!and eax,0x7fffffff ;without sign
!and ecx,0x80000000 ;sign only
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!movd xmm0, eax
!movss xmm1, xmm0
!mov edx, 0x55555555
!mul edx
!add edx, 0x2a508935
!movd xmm2, edx
!addss xmm1,xmm1 ;xmm1=2*a=constant
!movss xmm3,xmm2
!mulss xmm3,xmm3
!mulss xmm3,xmm2 ;xmm3=x*x*x
!movss xmm4,xmm3
!addss xmm4,xmm4 ;2*x*x*x
!addss xmm4,xmm0 ;xmm0=a
!addss xmm3,xmm1
!divss xmm3,xmm4
!mulss xmm2,xmm3
!movss xmm3,xmm2
!mulss xmm3,xmm3
!mulss xmm3,xmm2 ;xmm3=x*x*x
!movss xmm4,xmm3
!addss xmm4,xmm4 ;2*x*x*x
!addss xmm4,xmm0 ;xmm0=a
!addss xmm3,xmm1
!divss xmm3,xmm4
!mulss xmm2,xmm3
!movd xmm3, ecx
!por xmm2, xmm3 ;restore sign
!movss dword[v_Appr_F],xmm2
!fld dword[v_Appr_F]
ProcedureReturn
EndProcedure
Code: Select all
Procedure.d Cube_Root_D_HM_x86() ;only for x86
!movsd xmm0, [v_Test_Value_D]
!pcmpeqb xmm4, xmm4
!psrlq xmm4, 1
!movsd xmm1, xmm0
!pand xmm1, xmm4 ; without sign
!pandn xmm4, xmm0 ; sign only
!pshufd xmm0, xmm1, 1
;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
!movq xmm2, [MulConst]
!pmuludq xmm0, xmm2
!paddd xmm0, xmm2
!movsd xmm2,xmm0
!mulsd xmm2,xmm2
!mulsd xmm2,xmm0 ;xmm2=x*x*x
!movsd xmm3,xmm2
!addsd xmm3,xmm3 ;2*x*x*x
!addsd xmm3,xmm1 ;xmm1=a
!addsd xmm2,xmm1
!addsd xmm2,xmm1
!divsd xmm2,xmm3
!mulsd xmm0,xmm2
!movsd xmm2,xmm0
!mulsd xmm2,xmm2
!mulsd xmm2,xmm0 ;xmm2=x*x*x
!movsd xmm3,xmm2
!addsd xmm3,xmm3 ;2*x*x*x
!addsd xmm3,xmm1 ;xmm1=a
!addsd xmm2,xmm1
!addsd xmm2,xmm1
!divsd xmm2,xmm3
!mulsd xmm0,xmm2
!por xmm0, xmm4
!movsd qword[v_Appr_D],xmm0
!fld qword[v_Appr_D]
ProcedureReturn
!MulConst: dd 0x55555555, 0x2a9f7893
EndProcedure
Re: Cube root
Posted: Mon Dec 16, 2013 9:20 am
by Helle
In the firma i have an AMD:
CPU: AMD FX-8120@3.6GHz
32-Bit-Windows:
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 335 ms
Double Helle:
-23.112042408247962 / 943 ms
Float Psychophanta:
-23.1120453 / 1174 ms
Double Psychophanta:
-23.112042408247959 / 1150 ms
64-Bit-Windows:
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 322 ms
Double Helle:
-23.112042408247962 / 378 ms
Float Psychophanta:
-23.1120453 / 1166 ms
Double Psychophanta:
-23.112042408247959 / 1203 ms