PureBasic Forums - English

Posted: **Sun Oct 02, 2011 11:10 am**

From
http://www.asmcommunity.net/board/index ... ic=14769.0

Macro Cbrt(num)
  !jmp @f
  !.dat:dq 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333
  !@@:fld qword[.dat]
  !fld qword[v_#num#]
  !fabs
  !fyl2x                   ;->log2(Src1)*exponent
  !fld st0             ;copy the logarithm
  !frndint                 ;keep only the characteristic
  !fsub st1,st0          ;keeps only the mantissa
  !fxch                    ;get the mantissa on top
  !f2xm1                   ;->2^(mantissa)-1
  !fld1
  !faddp                   ;add 1 back
  !fscale                  ;scale it with the characteristic
  !fstp st1             ;copy result over and "pop" it
  !test byte[v_#num#+7],$80
  !jz @f
  !fchs                    ;restore sign
  !@@:fstp qword[v_#num#]
EndMacro
f.d=-26.4433
cbrt(f)
Debug f

Something better please?

Posted: **Mon Feb 27, 2012 3:22 pm**

Code: Select all

Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A

requires SSE.

Posted: **Mon Feb 27, 2012 10:36 pm**

evilgravedigger wrote:
Code: Select all
Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
requires SSE.

sorry, it does not work. Teh result value is wrong.
Besides it does not work for negative values.

Posted: **Mon Feb 27, 2012 11:49 pm**

evilgravedigger wrote:
Code: Select all
Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
requires SSE.

sqrtss return the square root not the cube one.

sse cuberoot : http://www.musicdsp.org/showone.php?id=206

Posted: **Tue Feb 28, 2012 9:38 am**

Another good lesson to realize about Intel x86 line is a fully crap.

Posted: **Tue Feb 28, 2012 8:39 pm**

xorc1zt wrote:
evilgravedigger wrote:
Code: Select all
Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
requires SSE.
sqrtss return the square root not the cube one.

sse cuberoot : http://www.musicdsp.org/showone.php?id=206

Ow, sorry, my bad.

Posted: **Thu Dec 12, 2013 12:05 pm**

http://www.musicdsp.org/showone.php?id=206

Mmmm!!
Still waiting for a faster and bit-wider (64 bit float values) function than the first one posted here (you can use SSE, SSE2, 3, 4, 5, 6 .... )

Posted: **Sat Dec 14, 2013 12:24 pm**

Ha, no problem

!

Code: Select all

;Cubic Root, Single Precision
;PB 5.21 LTS (x64)
;I use Heron: Loop {x = x * (((x * x * x) + 2 * a) / ((2 * x * x * x) + a))}

Test_Value.f = -12345.6789   ;Cubic_Root = -23,112042408247961097779983746659
Appr.f                       ;for all...

Procedure.f Cube_Root()
  !mov eax,[v_Test_Value]
  !and eax,7FFFFFFFh         ;approximation without sign
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !mov [v_Appr],eax
  !movss xmm0,[v_Appr]  
  !movss xmm1,xmm0
  !lea rdx,[v_Appr]
  !mov eax,[rdx]
  !xor edx,edx
  !div dword[Value3]
  !add eax,[BitHack]
  !mov [v_Appr],eax
  !movss xmm2,[v_Appr]

  !addss xmm1,xmm1           ;xmm1=2*a=constant
  !mov ecx,2                 ;higher=more precision (if possible)
 !@@:  
  !movss xmm3,xmm2
  !mulss xmm3,xmm3
  !mulss xmm3,xmm2           ;xmm3=x*x*x
  !movss xmm4,xmm3 
  !addss xmm4,xmm4           ;2*x*x*x
  !addss xmm4,xmm0           ;xmm0=a 
  !addss xmm3,xmm1 
  !divss xmm3,xmm4
  !mulss xmm2,xmm3
  !dec ecx  
 !jnz @b  
  ;set sign (if Test_Value negativ)
  !test byte[v_Test_Value+3],80h
 !jz @f
  !mulss xmm2,[Minus1]       ;restore sign
 !@@:
  !movss dword[v_Appr],xmm2
  !fld dword[v_Appr]
 ProcedureReturn
  !Minus1:  dd -1.0 
  !BitHack: dd 709921077     ;for Kahan´s bit hack; http://metamerist.com/cbrt/cbrt.htm
  !Value3:  dd 3
EndProcedure

TA = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt.f = Cube_Root()
Next
TE = ElapsedMilliseconds() - TA
MessageRequester("Cubic Root (Single-Precision) with SSE", StrF(Cbrt, 7) + #LFCR$ + Str(TE) + " ms for 10000000 Loops")

Code: Select all

;Cubic Root, Double Precision
;PB 5.21 LTS (x64)
;I use Heron: Loop {x = x * (((x * x * x) + 2 * a) / ((2 * x * x * x) + a))}

Test_Value.d = -12345.6789   ;Cubic_Root = -23,112042408247961097779983746659
Appr.d                       ;for all...

Procedure.d Cube_Root()
  !mov rax,[v_Test_Value]
  !mov rdx,7FFFFFFFFFFFFFFFh
  !and rax,rdx               ;approximation without sign
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !mov [v_Appr],rax
  !movsd xmm0,[v_Appr]  
  !movsd xmm1,xmm0
  !lea rdx,[v_Appr]
  !mov rax,[rdx]
  !xor rdx,rdx
  !div qword[Value3]
  !add rax,[BitHack]
  !mov [v_Appr],rax
  !movsd xmm2,[v_Appr]

  !addsd xmm1,xmm1           ;xmm1=2*a=constant
  !mov ecx,2                 ;higher=more precision (if possible)
 !@@:  
  !movsd xmm3,xmm2
  !mulsd xmm3,xmm3
  !mulsd xmm3,xmm2           ;xmm3=x*x*x
  !movsd xmm4,xmm3 
  !addsd xmm4,xmm4           ;2*x*x*x
  !addsd xmm4,xmm0           ;xmm0=a
  !addsd xmm3,xmm1 
  !divsd xmm3,xmm4
  !mulsd xmm2,xmm3
  !dec ecx  
 !jnz @b  
  ;set sign (if Test_Value negativ)
  !test byte[v_Test_Value+7],80h
 !jz @f
  !mulsd xmm2,[Minus1]       ;restore sign
 !@@:
  !movsd qword[v_Appr],xmm2
  !fld qword[v_Appr]
 ProcedureReturn
  !Minus1:  dq -1.0 
  !BitHack: dq 3071306043645493248     ;715094163<<32, for Kahan´s bit hack; http://metamerist.com/cbrt/cbrt.htm
  !Value3:  dq 3
EndProcedure

TA = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt.d = Cube_Root()
Next
TE = ElapsedMilliseconds() - TA
MessageRequester("Cubic Root (Double-Precision) with SSE2", StrD(Cbrt, 15) + #LFCR$ + Str(TE) + " ms for 10000000 Loops")

Make your tests and have fun!
Helle

Posted: **Sat Dec 14, 2013 1:38 pm**

OooppS!

The speed of cube root results cricical needed for x86 32 bit systems.
x86 running at 64 bit are quite strange.

Sorry, can not test as I use 32bit.

However, watching at your code, it still seems slower (Only the part of your function before the first label '!@@:' is much slower than the entire function in my firts post, and still more for the AMD processors

).
Have you tested it seriously?
Thanks!

Posted: **Sat Dec 14, 2013 3:52 pm**

I made tests with PB 64-Bit (Cubic Root -12345.6789, 10000000 Loops, Procedure):
Psychophanta:
- Float: -23.1120453 / 484 ms
- Double : -23.112042408247959 / 491 ms
Helle:
- Float: -23.1120415 / 169 ms
- Double : -23.112042408247962 / 203 ms

Edit: CPU = Intel i7-4770K@4.0GHz

Posted: **Sat Dec 14, 2013 5:46 pm**

Helle wrote:I made tests with PB 64-Bit (Cubic Root -12345.6789, 10000000 Loops, Procedure):
Psychophanta:
- Float: -23.1120453 / 484 ms
- Double : -23.112042408247959 / 491 ms
Helle:
- Float: -23.1120415 / 169 ms
- Double : -23.112042408247962 / 203 ms

Edit: CPU = Intel i7-4770K@4.0GHz

Strange, very strange
Psychophanta:
- Float: -23.1120453 / 343 ms
- Double : -23.112042408247959 / 375 ms

Edit: CPU = AMD Phenom II 550, 3.0GHz (purchased in year 2009, no overcloked, no overcored)

Posted: **Sun Dec 15, 2013 6:19 pm**

Test-Code for x86 (e.g. with other CPU´s):

Code: Select all

;For 32-Bit-Windows
;PB 5.21 LTS (x86)

Test_Value_D.d = -12345.6789 ;Cube Root = -23,112042408247961097779983746659
Appr_D.d                     ;for all...
Test_Value_F.f = -12345.6789
Appr_F.f

Procedure.f Cube_Root_F_H_x86()   ;only for x86
  !mov eax,[v_Test_Value_F]
  !and eax,7FFFFFFFh         ;without sign
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !mov [v_Appr_F],eax
  !movss xmm0,[v_Appr_F]  
  !movss xmm1,xmm0
  !lea edx,[v_Appr_F]
  !mov eax,[edx]
  !xor edx,edx
  !div dword[Value3_F]
  !add eax,[BitHack_F]
  !mov [v_Appr_F],eax
  !movss xmm2,[v_Appr_F]

  !addss xmm1,xmm1           ;xmm1=2*a=constant
  !mov ecx,2                 ;higher=more precision (if possible)
 !@@:  
  !movss xmm3,xmm2
  !mulss xmm3,xmm3
  !mulss xmm3,xmm2           ;xmm3=x*x*x
  !movss xmm4,xmm3 
  !addss xmm4,xmm4           ;2*x*x*x
  !addss xmm4,xmm0           ;xmm0=a 
  !addss xmm3,xmm1 
  !divss xmm3,xmm4
  !mulss xmm2,xmm3
  !dec ecx  
 !jnz @b  
  ;set sign (if Test_Value negativ)
  !test byte[v_Test_Value_F+3],80h
 !jz @f
  !mulss xmm2,[Minus1_F]     ;restore sign
 !@@:
  !movss dword[v_Appr_F],xmm2
  !fld dword[v_Appr_F]
 ProcedureReturn
  !Minus1_F:  dd -1.0 
  !BitHack_F: dd 709921077   ;for Kahan´s bit hack (Float); http://metamerist.com/cbrt/cbrt.htm
  !Value3_F:  dd 3
EndProcedure

Procedure.d Cube_Root_D_H_x86()   ;only for x86
  !mov eax,dword[v_Test_Value_D+4]
  !mov edx,dword[v_Test_Value_D]
  !mov dword[v_Appr_D],edx
  !and eax,7FFFFFFFh         ;without sign
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !mov dword[v_Appr_D+4],eax
  !movsd xmm0,[v_Appr_D]  
  !movsd xmm1,xmm0
  !lea edx,[v_Appr_D]
  !mov eax,[edx+4]
  !xor edx,edx
  !div dword[Value3_D]
  !add eax,[BitHack_D+4]
  !mov dword[v_Appr_D+4],eax
  !movsd xmm2,[v_Appr_D]

  !addsd xmm1,xmm1           ;xmm1=2*a=constant
  !mov ecx,2                 ;higher=more precision (if possible)
 !@@:  
  !movsd xmm3,xmm2
  !mulsd xmm3,xmm3
  !mulsd xmm3,xmm2           ;xmm3=x*x*x
  !movsd xmm4,xmm3 
  !addsd xmm4,xmm4           ;2*x*x*x
  !addsd xmm4,xmm0           ;xmm0=a
  !addsd xmm3,xmm1 
  !divsd xmm3,xmm4
  !mulsd xmm2,xmm3
  !dec ecx  
 !jnz @b  
  ;set sign (if Test_Value negativ)
  !test byte[v_Test_Value_D+7],80h
 !jz @f
  !mulsd xmm2,[Minus1_D]     ;restore sign
 !@@:
  !movsd qword[v_Appr_D],xmm2
  !fld qword[v_Appr_D]
 ProcedureReturn
  !Minus1_D:  dq -1.0 
  !BitHack_D: dq 3071306043645493248   ;715094163<<32, for Kahan´s bit hack (Double); http://metamerist.com/cbrt/cbrt.htm
  !Value3_D:  dd 3
EndProcedure

Procedure.f Cube_Root_F_P()  ;for x86 and x64
  !fld dword[Dat_F]
  !fld dword[v_Test_Value_F]
  !fabs
  !fyl2x                     ;->log2(Src1)*exponent
  !fld st0                   ;copy the logarithm
  !frndint                   ;keep only the characteristic
  !fsub st1,st0              ;keeps only the mantissa
  !fxch                      ;get the mantissa on top
  !f2xm1                     ;->2^(mantissa)-1
  !fld1
  !faddp                     ;add 1 back
  !fscale                    ;scale it with the characteristic
  !fstp st1                  ;copy result over and "pop" it
  !test byte[v_Test_Value_F+3],$80
 !jz @f
  !fchs                      ;restore sign
 !@@:
 ProcedureReturn 
  !Dat_F:     dd 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 ;:-)
EndProcedure

Procedure.d Cube_Root_D_P()  ;for x86 and x64
  !fld qword[Dat_D]
  !fld qword[v_Test_Value_D]
  !fabs
  !fyl2x                     ;->log2(Src1)*exponent
  !fld st0                   ;copy the logarithm
  !frndint                   ;keep only the characteristic
  !fsub st1,st0              ;keeps only the mantissa
  !fxch                      ;get the mantissa on top
  !f2xm1                     ;->2^(mantissa)-1
  !fld1
  !faddp                     ;add 1 back
  !fscale                    ;scale it with the characteristic
  !fstp st1                  ;copy result over and "pop" it
  !test byte[v_Test_Value_D+7],$80
 !jz @f
  !fchs                      ;restore sign
 !@@:
 ProcedureReturn 
  !Dat_D:     dq 0.33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 ;:-)
EndProcedure

TA_F_H = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt_F_H_x86.f = Cube_Root_F_H_x86()
Next
TE_F_H = ElapsedMilliseconds() - TA_F_H
Result$ = "Test-Value : -12345.6789" + #LFCR$ + "Float Helle:" + #LFCR$ + StrF(Cbrt_F_H_x86, 7) + " / " + Str(TE_F_H) + " ms " + #LFCR$

TA_D_H = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt_D_H_x86.d = Cube_Root_D_H_x86()
Next
TE_D_H = ElapsedMilliseconds() - TA_D_H
Result$ + "Double Helle:" + #LFCR$ + StrD(Cbrt_D_H_x86, 15) + " / " + Str(TE_D_H) + " ms " + #LFCR$

TA_F_P = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt_F_P.f = Cube_Root_F_P()
Next
TE_F_P = ElapsedMilliseconds() - TA_F_P
Result$ + "Float Psychophanta:" + #LFCR$ + StrF(Cbrt_F_P, 7) + " / " + Str(TE_F_P) + " ms " + #LFCR$

TA_D_P = ElapsedMilliseconds()
For i = 1 To 10000000
  Cbrt_D_P.d = Cube_Root_D_P()
Next
TE_D_P = ElapsedMilliseconds() - TA_D_P
Result$ + "Double Psychophanta:" + #LFCR$ + StrD(Cbrt_D_P, 15) + " / " + Str(TE_D_P) + " ms " + #LFCR$ + #LFCR$

Result$ + "CPU: Intel i7-4770K@4.0GHz" + #LFCR$

;SetClipboardText(Result$)

MessageRequester("Cube Root x86, 10000000 Loops", Result$)

My results (with VirtualBox under Win7 x64):
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 191 ms
Double Helle:
-23.112042408247962 / 362 ms; I have for Double_x86 too many overhead
Float Psychophanta:
-23.1120453 / 481 ms
Double Psychophanta:
-23.112042408247959 / 488 ms

CPU: Intel i7-4770K@4.0GHz

Posted: **Mon Dec 16, 2013 12:13 am**

Ok, the function you posted seems faster for intel i7 and x64 OS,
There would be good if someone could test it in an AMD.

Posted: **Mon Dec 16, 2013 7:43 am**

Psychophanta wrote:Ok, the function you posted seems faster for intel i7 and x64 OS,
There would be good if someone could test it in an AMD.

Sorry, no AMD here.
Here's a small modification of the versions Helle posted.
Does that make a difference on AMD ?

Code: Select all

Procedure.f Cube_Root_F_H_x86()   ;only for x86
  !mov eax,[v_Test_Value_F]
  !mov ecx, eax
  !and eax,0x7fffffff        ;without sign
  !and ecx,0x80000000        ;sign only 
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !movd xmm0, eax
  !movss xmm1, xmm0
  !mov edx, 0x55555555
  !mul edx
  !add edx, 0x2a508935
  !movd xmm2, edx
  !addss xmm1,xmm1           ;xmm1=2*a=constant
  
  !movss xmm3,xmm2
  !mulss xmm3,xmm3
  !mulss xmm3,xmm2           ;xmm3=x*x*x
  !movss xmm4,xmm3 
  !addss xmm4,xmm4           ;2*x*x*x
  !addss xmm4,xmm0           ;xmm0=a 
  !addss xmm3,xmm1
  !divss xmm3,xmm4
  !mulss xmm2,xmm3
  
  !movss xmm3,xmm2
  !mulss xmm3,xmm3
  !mulss xmm3,xmm2           ;xmm3=x*x*x
  !movss xmm4,xmm3 
  !addss xmm4,xmm4           ;2*x*x*x
  !addss xmm4,xmm0           ;xmm0=a 
  !addss xmm3,xmm1
  !divss xmm3,xmm4
  !mulss xmm2,xmm3
  
  !movd xmm3, ecx
  !por xmm2, xmm3            ;restore sign
  !movss dword[v_Appr_F],xmm2
  !fld dword[v_Appr_F]
 ProcedureReturn
EndProcedure

Code: Select all

Procedure.d Cube_Root_D_HM_x86()   ;only for x86
  !movsd xmm0, [v_Test_Value_D]
  !pcmpeqb xmm4, xmm4
  !psrlq xmm4, 1
  !movsd xmm1, xmm0
  !pand xmm1, xmm4           ; without sign
  !pandn xmm4, xmm0          ; sign only
  !pshufd xmm0, xmm1, 1
  ;now find a good approximation for the start-value; http://metamerist.com/cbrt/cbrt.htm - Kahan´s bit hack
  !movq xmm2, [MulConst]
  !pmuludq xmm0, xmm2
  !paddd xmm0, xmm2
  
  !movsd xmm2,xmm0
  !mulsd xmm2,xmm2
  !mulsd xmm2,xmm0           ;xmm2=x*x*x
  !movsd xmm3,xmm2 
  !addsd xmm3,xmm3           ;2*x*x*x
  !addsd xmm3,xmm1           ;xmm1=a
  !addsd xmm2,xmm1 
  !addsd xmm2,xmm1 
  !divsd xmm2,xmm3
  !mulsd xmm0,xmm2
  
  !movsd xmm2,xmm0
  !mulsd xmm2,xmm2
  !mulsd xmm2,xmm0           ;xmm2=x*x*x
  !movsd xmm3,xmm2 
  !addsd xmm3,xmm3           ;2*x*x*x
  !addsd xmm3,xmm1           ;xmm1=a
  !addsd xmm2,xmm1 
  !addsd xmm2,xmm1 
  !divsd xmm2,xmm3
  !mulsd xmm0,xmm2
  
  !por xmm0, xmm4
  !movsd qword[v_Appr_D],xmm0
  !fld qword[v_Appr_D]
  ProcedureReturn
  !MulConst:  dd 0x55555555, 0x2a9f7893 
EndProcedure

Posted: **Mon Dec 16, 2013 9:20 am**

In the firma i have an AMD:

CPU: AMD FX-8120@3.6GHz

32-Bit-Windows:
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 335 ms
Double Helle:
-23.112042408247962 / 943 ms
Float Psychophanta:
-23.1120453 / 1174 ms
Double Psychophanta:
-23.112042408247959 / 1150 ms

64-Bit-Windows:
Test-Value : -12345.6789
Float Helle:
-23.1120415 / 322 ms
Double Helle:
-23.112042408247962 / 378 ms
Float Psychophanta:
-23.1120453 / 1166 ms
Double Psychophanta:
-23.112042408247959 / 1203 ms

PureBasic Forums - English

Cube root

Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root

Re: Cube root