It is currently Tue Jan 19, 2021 8:09 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: sse patch fasm
PostPosted: Sat Jan 23, 2016 12:04 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
Patch x87 ops to sse, only relevant for windows and linux and only tested on x64
patches +, -, *, /

The patched ops are ~40% faster x87 ops on my AMD
I would be interested to see if there's any performance increase on Intel's

compile with debugger to verify it's working
and compile without debugger to rate it's performance

Code:
Macro PatchSSE()

!macro FLD var
!{
  !match =dword x , var \{ movss xmm0, var \}
  !match =qword x , var \{ movsd xmm0, var \}
!}
   
!macro FADD var
!{       
  !match =dword x , var \{ addss xmm0, var \}
  !match =qword x , var \{ addsd xmm0, var \}
!}

!macro FSTP var
!{   
  !match =dword x, var \{ movss var, xmm0 \}
  !match =qword x, var \{ movsd var, xmm0 \}
!}

!macro FSUB var
!{
  !match =dword x, var \{ subss xmm0, var \}
  !match =qword x, var \{ subsd xmm0, var \}
!}

!macro FMUL var
!{ 
  !match =dword x, var \{ mulss xmm0, var \}   
  !match =qword x, var \{ mulsd xmm0, var \} 
!}
 
!macro FDIV var
 !{
  !match =dword x, var  \{ divss xmm0, var \}
  !match =qword x, var  \{ divsd xmm0, var \}
 !}
 
EndMacro 

Macro _Cos(result,angle)
  EnableASM
  fld angle
  fcos
  fstp result
  DisableASM
EndMacro   

Macro _Sin(result,angle)
  EnableASM
  fld angle
  fsin
  fstp result
  DisableASM
EndMacro   

Macro _Tan(result,angle)
  EnableASM
  fld angle
  fptan
  fstp result
  fstp result
  DisableASM
EndMacro   


Global a.f,b.f,c.f,aa.d,bb.d,cc.d,avg.i
Global t1.s,t2.s,t3.s,t4.s

CompilerIf #PB_Compiler_Debugger
a = #PI
b = #PI

c = a+b
Debug c
c * 2
Debug c
c = a-b
Debug c
c = a*b
Debug c
c = a/b
Debug c
a = Cos(c)
Debug a
b = Sin(c) 
Debug b
a = Tan(b)
Debug a
Debug "============="

PatchSSE()
a = #PI
b = #PI

c = a+b
Debug c
c * 2
Debug c
c = a-b
Debug c
c = a*b
Debug c
c = a/b
Debug c
_Cos(a,c)
Debug a
_Sin(b,c) 
Debug b
_Tan(a,b)
Debug a

CompilerElse
 
a = #PI
b = #PI
avg=0
For j = 1 To 10
 
  st = ElapsedMilliseconds()
  For i = 0 To 9000000
    c = a+b
    c * 2
    c = a-b
    c = a*b
    c = a/b
  Next
  avg + (ElapsedMilliseconds() -st)
 
Next
avg / 10

t1.s = "x87 float " + Str(avg)

aa = #PI
bb = #PI
avg=0
For j = 1 To 10
 
  st = ElapsedMilliseconds()
  For i = 0 To 9000000
    cc = aa+bb
    cc * 2
    cc = aa-bb
    cc = aa*bb
    cc = aa/bb
  Next
  avg + (ElapsedMilliseconds() -st)
 
Next
avg / 10

t2.s = "x87 double " + Str(avg)


PatchSSE() ;test with sse patch 

a = #PI
b = #PI
avg=0
For j = 1 To 10
 
  st = ElapsedMilliseconds()
  For i = 0 To 9000000
    c = a+b
    c * 2
    c = a-b
    c = a*b
    c = a/b
  Next
  avg + (ElapsedMilliseconds() -st)
 
Next
avg / 10

t3.s = "sse float " + Str(avg)

aa = #PI
bb = #PI
avg=0
For j = 1 To 10
 
  st = ElapsedMilliseconds()
  For i = 0 To 9000000
    cc = aa+bb
    cc * 2
    cc = aa-bb
    cc = aa*bb
    cc = aa/bb
  Next
  avg + (ElapsedMilliseconds() -st)
 
Next
avg / 10

t4.s = "sse double " + Str(avg)


MessageRequester("times ",t1 + #CRLF$ + t2 + #CRLF$ + t3 + #CRLF$ + t4 )

CompilerEndIf



Top
 Profile  
Reply with quote  
 Post subject: Re: sse patch fasm
PostPosted: Mon Jan 25, 2016 11:45 am 
Offline
Addict
Addict
User avatar

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1260
Intel Xeon E5-1620

x87 float 40
x87 double 43
sse float 26
sse double 40

With 10 times the iterations:
x87 float 414
x87 double 377
sse float 236
sse double 374


Top
 Profile  
Reply with quote  
 Post subject: Re: sse patch fasm
PostPosted: Mon Jan 25, 2016 7:19 pm 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
I didn't expect it'd be faster on Intel but it looks good for floats
the results I get on my Amd fx 6100

x87 float 67
x87 double 65
sse float 47
sse double 52


Top
 Profile  
Reply with quote  
 Post subject: Re: sse patch fasm
PostPosted: Mon Jan 25, 2016 8:05 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Apr 12, 2006 7:59 pm
Posts: 174
Location: Germany
Intel i7-6700K@4.7GHz:
- x87 float: 16 (x10: 164)
- x87 double: 16 (x10: 164)
- sse float: 17 (x10: 178)
- sse double. 17 (x10: 179)
For x87 float take a look at http://www.purebasic.fr/english/viewtopic.php?f=12&t=25242
For x87-differences between AMD-Intel take a look at http://purebasic.fr/german/viewtopic.php?f=8&t=29419&sid=4d38f49f00744ac5738f35c195e7295b
This is a test-version for calculate Mersenne M49 or others (a new version is faster).
Helle
@idle: Sorry...Time...

__________________________________________________
Domain changed
25.01.2016
RSBasic


Top
 Profile  
Reply with quote  
 Post subject: Re: sse patch fasm
PostPosted: Mon Jan 25, 2016 8:59 pm 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3556
Location: New Zealand
Helle, that's kind of what I expected to see for intel!
Setting the FPU-Control-Word to single-precision-float does the trick too
the sse patch is then only 5% quicker for floats on my system.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye