PureBasic Forum http://forums.purebasic.com/english/ |
|
AVX-Example http://forums.purebasic.com/english/viewtopic.php?f=35&t=47577 |
Page 1 of 1 |
Author: | Helle [ Fri Sep 16, 2011 5:09 pm ] |
Post subject: | AVX-Example |
Hey, what´s that? An assembler-forum and no new codes!? A great new field for assembly-programming are the AVX-Instructions of the new Intel-(Sandy Bridge) and AMD-(Bulldozer, next month?) CPUs. Great: 256-bit-register (YMM)! This a simple example for this, only for fun: Code: ;- Computes the sine (Taylor/Maclaurin series, double-precision) with AVX ;- "Helle" Klaus Helbing, 16.09.2011, PB v4.51 (x64) ;- Sin(x) = x - (x^3)/3! + (x^5)/5! - (x^7)/7! + (x^9)/9! - (x^11)/11! + (x^13)/13!... ;- Any results (CPU: Intel i7-2600, 3.4GHz): ;-- Test for Radiant = 9.87654321 (12 terms, with range-reduce): ;-- AVX : -0.436554369141429 = -25.0127228798° in 171 ms ;-- PB : -0.436554369141429 = -25.0127228798° in 343 ms ;-- Test for Radiant = 9.87654321 (only first 4 terms, with range-reduce): ;-- AVX : -0.436554369145432 = -25.0127228800° in 156 ms ;-- PB : -0.436554369141429 = -25.0127228798° in 343 ms ;-- Test for Radiant = 1.23456789 (only first 4 terms, without range-reduce): ;-- AVX : 0.944005976932239 = 54.0875583133° in 63 ms ;-- PB : 0.944005725004535 = 54.0875438789° in 343 ms ;-- Test for Radiant = 1.23456789 (12 terms, without range-reduce): ;-- AVX : 0.944005725004535 = 54.0875438789° in 93 ms ;-- PB : 0.944005725004535 = 54.0875438789° in 343 ms ;- need a CPU with AVX-support (Sept.2011: Intel "Sandy Bridge") and the actual FAsm.exe from http://flatassembler.net Procedure.d SinAVX(x.d) ;x=radiant, parameter in XMM0/YMM0 ;this part is for reduce to range 0-Pi/2 and set signum. If x is in this range (or a little bigger, in your application), you can skip this (faster) !vmovsd xmm2,qword[Pi_Half] ;load XMM2 with Pi_Half (1.570796326794896619) !vdivsd xmm0,xmm0,xmm2 ;divide XMM0 (x=radiant) by XMM2 (Pi_Half), result in XMM0 !vcvttsd2si rax,xmm0 ;convert the result (float double precision) to integer with truncation (like Int(x) in PB) !vcvtsi2sd xmm1,xmm1,rax ;convert the integer in RAX to float double precision in XMM1 !vsubsd xmm0,xmm0,xmm1 ;subtract XMM1 from XMM0, result in XMM0 !vmulsd xmm0,xmm0,xmm2 ;multiply XMM0 with XMM2 (Pi_Half), result in XMM0 !test rax,1 ;test for quadrant 2 and 4 (6,8...) !jz @f ;no !vaddsd xmm0,xmm0,xmm2 ;add Pi_Half (XMM2) to XMM0, result in XMM0 !@@: !test rax,2 ;test for quadrant 3 and 4 (and all negatives) !jz @f ;no !vxorpd xmm0,xmm0,[Minus] ;change (set) bit 63 (signum) !@@: ;calculate the first 4 terms (without start-value x), 1=0-63, 2=64-127, 3=128-191, 4=192-255 !vmovddup xmm0,xmm0 ;XMM0: 1=x^1 2=x^1 duplicate bits 0-63 in bits 64-127 !vinsertf128 ymm1,ymm0,xmm0,1b ;YMM1: 1=x^1 2=x^1 3=x^1 4=x^1 YMM1(0-127)=XMM0, YMM1(128-255)=XMM0 !vmulpd ymm2,ymm1,ymm1 ;YMM2: 1=x^2 2=x^2 3=x^2 4=x^2 !vmulsd xmm3,xmm2,xmm2 ;YMM3: 1=x^4 2=x^2 3=x^0 4=x^0 !vmulpd ymm4,ymm2,ymm2 ;YMM4: 1=x^4 2=x^4 3=x^4 4=x^4 !vmulpd ymm2,ymm4,ymm4 ;YMM2: 1=x^8 2=x^8 3=x^8 4=x^8 for the next 4 terms !vmulpd ymm3,ymm3,ymm1 ;YMM3: 1=x^5 2=x^3 3=x^0 4=x^0 !vmulpd ymm1,ymm3,ymm4 ;YMM1: 1=x^9 2=x^7 3=x^0 4=x^0 !vperm2f128 ymm5,ymm1,ymm3,100000b ;YMM5: 1=x^9 2=x^7 3=x^5 4=x^3 YMM5(0-127)=YMM1(0-127), YMM5(128-255)=YMM3(0-127) !vmovupd ymm4,yword[RezFak] ;load the first 4 reciprocals factorials (-1/3! ... 1/9!) in YMM4 !vmulpd ymm1,ymm5,ymm4 ;multiply the 4 values in YMM4 with YMM5, result in YMM1 ;next 4 terms, I think, too short for a loop. For lower precision (faster) you can reduce the steps !vmulpd ymm5,ymm5,ymm2 ;YMM5: 1=x^17 2=x^15 3=x^13 4=x^11 !vmulpd ymm3,ymm5,yword[RezFak+32] !vaddpd ymm1,ymm3,ymm1 ;next 4 terms !vmulpd ymm5,ymm5,ymm2 ;YMM5: 1=x^25 2=x^23 3=x^21 4=x^19 !vmulpd ymm3,ymm5,yword[RezFak+64] !vaddpd ymm1,ymm3,ymm1 !vhaddpd ymm2,ymm1,ymm1 ;YMM2: 1=1+2 of YMM1, 3=3+4 of YMM1 !vextractf128 xmm1,ymm2,1b ;XMM1: 3+4 of YMM2 !vaddsd xmm3,xmm2,xmm1 ;XMM3: 1=sum of iterations !vaddsd xmm0,xmm0,xmm3 ;XMM0: 1=sum of iterations plus start-value (1.term=x) !vzeroupper ;set YMM0H-YMM15H to zero !vmovsd qword[rsp+48],xmm0 ;because PB is not ABI (Application Binary Interface)-conform (standard: return-value in XMM0/YMM0) !fld qword[rsp+48] ProcedureReturn !Minus: !dq 8000000000000000h ;for change (set) bit 63 (signum) !Pi_Half: !dq 1.570796326794896619 !RezFak: !dq 2.755731922398589065e-6 ; 1/9! 4.Iteration !dq -1.984126984126984127e-4 ;-1/7! 3.Iteration !dq 8.333333333333333333e-3 ; 1/5! 2.Iteration !dq -1.666666666666666667e-1 ;-1/3! 1.Iteration !dq 2.811457254345520763e-15 ; 1/17! 8.Iteration !dq -7.647163731819816476e-13 ;-1/15! 7.Iteration !dq 1.605904383682161460e-10 ; 1/13! 6.Iteration !dq -2.505210838544171878e-8 ;-1/11! 5.Iteration !dq 6.446950284384473396e-26 ; 1/25! 12.Iteration !dq -3.868170170630684038e-23 ;-1/23! 11.Iteration !dq 1.957294106339126123e-20 ; 1/21! 10.Iteration !dq -8.220635246624329717e-18 ;-1/19! 9.Iteration EndProcedure Rad.d = 9.87654321 ;- Test AVX T1= ElapsedMilliseconds() For i = 0 To 9999999 SinAVX.d = SinAVX(Rad) Next T1 = ElapsedMilliseconds() - T1 ;- Test PB T2= ElapsedMilliseconds() For i = 0 To 9999999 SinPB.d = Sin(Rad) Next T2 = ElapsedMilliseconds() - T2 Sin$ = "Test for Radiant = " + StrD(Rad) + #CRLF$ + #CRLF$ Sin$ + "AVX : " + StrD(SinAVX, 15) + " = " + StrD(SinAVX * 180 / #PI) + "° in " + Str(T1) + " ms" + #CRLF$ Sin$ + "PB : " + StrD(SinPB, 15) + " = " + StrD(SinPB * 180 / #PI) + "° in " + Str(T2) + " ms" + #CRLF$ MessageRequester("Sinus Double-Precision with AVX", Sin$) End Have fun! Helle |
Page 1 of 1 | All times are UTC + 1 hour |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |