# PureBasic Forum

 It is currently Sun Jan 24, 2021 8:50 pm

 All times are UTC + 1 hour

 Page 1 of 1 [ 1 post ]
 Print view Previous topic | Next topic
Author Message
 Post subject: AVX-ExamplePosted: Fri Sep 16, 2011 5:09 pm
 Enthusiast

Joined: Wed Apr 12, 2006 7:59 pm
Posts: 174
Location: Germany
Hey, what´s that? An assembler-forum and no new codes!?
A great new field for assembly-programming are the AVX-Instructions of the new Intel-(Sandy Bridge) and AMD-(Bulldozer, next month?) CPUs. Great: 256-bit-register (YMM)!
This a simple example for this, only for fun:
Code:
;- Computes the sine (Taylor/Maclaurin series, double-precision) with AVX
;- "Helle" Klaus Helbing, 16.09.2011, PB v4.51 (x64)
;- Sin(x) = x - (x^3)/3! + (x^5)/5! - (x^7)/7! + (x^9)/9! - (x^11)/11! + (x^13)/13!...
;- Any results (CPU: Intel i7-2600, 3.4GHz):
;-- Test for Radiant = 9.87654321 (12 terms, with range-reduce):
;-- AVX : -0.436554369141429 = -25.0127228798°  in 171 ms
;-- PB  : -0.436554369141429 = -25.0127228798°  in 343 ms
;-- Test for Radiant = 9.87654321 (only first 4 terms, with range-reduce):
;-- AVX : -0.436554369145432 = -25.0127228800°  in 156 ms
;-- PB  : -0.436554369141429 = -25.0127228798°  in 343 ms
;-- Test for Radiant = 1.23456789 (only first 4 terms, without range-reduce):
;-- AVX : 0.944005976932239 = 54.0875583133°  in 63 ms
;-- PB  : 0.944005725004535 = 54.0875438789°  in 343 ms
;-- Test for Radiant = 1.23456789 (12 terms, without range-reduce):
;-- AVX : 0.944005725004535 = 54.0875438789°  in 93 ms
;-- PB  : 0.944005725004535 = 54.0875438789°  in 343 ms

;- need a CPU with AVX-support (Sept.2011: Intel "Sandy Bridge") and the actual FAsm.exe from http://flatassembler.net
Procedure.d SinAVX(x.d)                ;x=radiant, parameter in XMM0/YMM0
;this part is for reduce to range 0-Pi/2 and set signum.  If x is in this range (or a little bigger, in your application), you can skip this (faster)
!vmovsd xmm2,qword[Pi_Half]          ;load XMM2 with Pi_Half (1.570796326794896619)
!vdivsd xmm0,xmm0,xmm2               ;divide XMM0 (x=radiant) by XMM2 (Pi_Half), result in XMM0
!vcvttsd2si rax,xmm0                 ;convert the result (float double precision) to integer with truncation (like Int(x) in PB)
!vcvtsi2sd xmm1,xmm1,rax             ;convert the integer in RAX to float double precision in XMM1
!vsubsd xmm0,xmm0,xmm1               ;subtract XMM1 from XMM0, result in XMM0
!vmulsd xmm0,xmm0,xmm2               ;multiply XMM0 with XMM2 (Pi_Half), result in XMM0
!test rax,1                          ;test for quadrant 2 and 4 (6,8...)
!jz @f                               ;no
!@@:
!test rax,2                          ;test for quadrant 3 and 4 (and all negatives)
!jz @f                               ;no
!vxorpd xmm0,xmm0,[Minus]            ;change (set) bit 63 (signum)
!@@:
;calculate the first 4 terms (without start-value x), 1=0-63, 2=64-127, 3=128-191, 4=192-255
!vmovddup xmm0,xmm0                  ;XMM0: 1=x^1 2=x^1  duplicate bits 0-63 in bits 64-127
!vinsertf128 ymm1,ymm0,xmm0,1b       ;YMM1: 1=x^1 2=x^1 3=x^1 4=x^1  YMM1(0-127)=XMM0, YMM1(128-255)=XMM0
!vmulpd ymm2,ymm1,ymm1               ;YMM2: 1=x^2 2=x^2 3=x^2 4=x^2
!vmulsd xmm3,xmm2,xmm2               ;YMM3: 1=x^4 2=x^2 3=x^0 4=x^0
!vmulpd ymm4,ymm2,ymm2               ;YMM4: 1=x^4 2=x^4 3=x^4 4=x^4
!vmulpd ymm2,ymm4,ymm4               ;YMM2: 1=x^8 2=x^8 3=x^8 4=x^8  for the next 4 terms
!vmulpd ymm3,ymm3,ymm1               ;YMM3: 1=x^5 2=x^3 3=x^0 4=x^0
!vmulpd ymm1,ymm3,ymm4               ;YMM1: 1=x^9 2=x^7 3=x^0 4=x^0
!vperm2f128 ymm5,ymm1,ymm3,100000b   ;YMM5: 1=x^9 2=x^7 3=x^5 4=x^3  YMM5(0-127)=YMM1(0-127), YMM5(128-255)=YMM3(0-127)

!vmovupd ymm4,yword[RezFak]          ;load the first 4 reciprocals factorials (-1/3! ... 1/9!) in YMM4
!vmulpd ymm1,ymm5,ymm4               ;multiply the 4 values in YMM4 with YMM5, result in YMM1
;next 4 terms, I think, too short for a loop. For lower precision (faster) you can reduce the steps
!vmulpd ymm5,ymm5,ymm2               ;YMM5: 1=x^17 2=x^15 3=x^13 4=x^11
!vmulpd ymm3,ymm5,yword[RezFak+32]
;next 4 terms
!vmulpd ymm5,ymm5,ymm2               ;YMM5: 1=x^25 2=x^23 3=x^21 4=x^19
!vmulpd ymm3,ymm5,yword[RezFak+64]

!vhaddpd ymm2,ymm1,ymm1              ;YMM2: 1=1+2 of YMM1, 3=3+4 of YMM1
!vextractf128 xmm1,ymm2,1b           ;XMM1: 3+4 of YMM2
!vaddsd xmm3,xmm2,xmm1               ;XMM3: 1=sum of iterations

!vaddsd xmm0,xmm0,xmm3               ;XMM0: 1=sum of iterations plus start-value (1.term=x)

!vzeroupper                          ;set YMM0H-YMM15H to zero

!vmovsd qword[rsp+48],xmm0           ;because PB is not ABI (Application Binary Interface)-conform (standard: return-value in XMM0/YMM0)
!fld qword[rsp+48]

ProcedureReturn

!Minus:
!dq 8000000000000000h           ;for change (set) bit 63 (signum)
!Pi_Half:
!dq  1.570796326794896619
!RezFak:
!dq  2.755731922398589065e-6    ; 1/9!   4.Iteration
!dq -1.984126984126984127e-4    ;-1/7!   3.Iteration
!dq  8.333333333333333333e-3    ; 1/5!   2.Iteration
!dq -1.666666666666666667e-1    ;-1/3!   1.Iteration

!dq  2.811457254345520763e-15   ; 1/17!  8.Iteration
!dq -7.647163731819816476e-13   ;-1/15!  7.Iteration
!dq  1.605904383682161460e-10   ; 1/13!  6.Iteration
!dq -2.505210838544171878e-8    ;-1/11!  5.Iteration

!dq  6.446950284384473396e-26   ; 1/25!  12.Iteration
!dq -3.868170170630684038e-23   ;-1/23!  11.Iteration
!dq  1.957294106339126123e-20   ; 1/21!  10.Iteration
!dq -8.220635246624329717e-18   ;-1/19!  9.Iteration
EndProcedure

;- Test AVX
T1= ElapsedMilliseconds()
For i = 0 To 9999999
Next
T1 = ElapsedMilliseconds() - T1

;- Test PB
T2= ElapsedMilliseconds()
For i = 0 To 9999999
Next
T2 = ElapsedMilliseconds() - T2

Sin\$ = "Test for Radiant = " + StrD(Rad) + #CRLF\$ + #CRLF\$
Sin\$ + "AVX : " + StrD(SinAVX, 15) + " = " + StrD(SinAVX * 180 / #PI) + "°  in " + Str(T1) + " ms" + #CRLF\$
Sin\$ + "PB   : " + StrD(SinPB, 15) + " = " + StrD(SinPB * 180 / #PI) + "°  in " + Str(T2) + " ms" + #CRLF\$
MessageRequester("Sinus Double-Precision with AVX", Sin\$)

End

Have fun!
Helle

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 1 post ]

 All times are UTC + 1 hour

#### Who is online

Users browsing this forum: No registered users and 1 guest

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forum

Search for:
 Jump to:  Select a forum ------------------ PureBasic    Coding Questions    Game Programming    3D Programming    Assembly Programming    The PureBasic Editor    The PureBasic Form Designer    General Discussion    Feature Requests and Wishlists    Tricks 'n' Tips Bug Reports    Bugs - Windows    Bugs - Linux    Bugs - Mac OSX    Bugs - IDE    Bugs - Documentation OS Specific    AmigaOS    Linux    Windows    Mac OSX Miscellaneous    Announcement    Off Topic Showcase    Applications - Feedback and Discussion    PureFORM & JaPBe    TailBite