It is currently Sun Jan 24, 2021 8:22 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 14 posts ] 
Author Message
 Post subject: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 7:05 pm 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Here is some PB code:

Code:
For pos.l = subChunk2Size - 2 To 0  Step -2
  PokeL(*FB + (pos << 1), PeekW(*FB + pos))
Next

I have tried to replicate it in ASM but it is slower and I don't think it works the same. Note that we are doing a 16- to 32-bit conversion. Below is my attempt. How would I replicate the PB in ASM?

Code:
FB.l = *FB
EnableASM
MOV esi_temp,ESI ;SAVE NON-VOLATILE REGISTER
 
MOV ESI,FB ;LOAD ESI REGISTER WITH BUFFER ADDRESS
MOV ECX,subChunkSize2 ;OFFSET
SUB ECX,2

loop: MOV EDX,[ESI+ECX] ;ADDRESS IS ALREADY IN ESI REGISTER, CX HOLDS OFFSET
MOV [ESI+ECX*2],EDX
SUB ECX,2
JGE l_loop

MOV ESI,esi_temp ;RESTORE NON-VOLATILE REGISTER
DisableASM


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 7:31 pm 
Offline
Always Here
Always Here

Joined: Fri Oct 23, 2009 2:33 am
Posts: 6282
Location: Wales, UK
Here is how: http://www.purebasic.fr/english/viewtopic.php?f=35&t=48298

Edit: Follow sRod's instructions, near the end of the page

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 7:42 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 8:25 pm 
Offline
Addict
Addict
User avatar

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1260
It can't be slower. Are you sure you have disabled the debugger on your performance test?
And what exactly do you want to do?


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 9:08 pm 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Here's what I'm up to:

viewtopic.php?f=12&t=39830

FLAC is a lossless compression scheme for audio. This is unfortunately a necessary step for encoding to FLAC.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 9:53 pm 
Offline
Addict
Addict
User avatar

Joined: Tue Nov 09, 2010 10:15 pm
Posts: 1719
wilbert wrote:
If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.


I am with Wilbert here. If you are going to regularly be moving more than about a dozen var sizes of data (and in audio you would be), then move it in the largest "native" size possible--which is always an integer. Do your loop divided by the difference (i.e. 2 with 32-bit, and 4 with 64-bit if you are working with 16-bit words), and then do the remainder afterwards.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 10:15 pm 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Thorium wrote:
Are you sure you have disabled the debugger on your performance test?

Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 10:16 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Apr 12, 2006 7:59 pm
Posts: 174
Location: Germany
@chris319: You read 32-Bit, not 16-Bit (High-Word!). And: Why not this scheme without ReadData:
Code:
Global *FB=AllocateMemory(Lof(File)*2)
j=0
For i=0 To Lof(File)-2 Step 2
  PokeW(*FB+j,ReadWord(File))
  j+4
Next

Code from me without ASM :D ! Off Topic :lol: !
Helle


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 10:27 pm 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Now for some stats:

PB For ... Next with debugger: 1560 ms

PB For ... Next without debugger: 94 ms

ASM with debugger: 1996 ms

ASM without debugger: 31 ms

Without the debugger, ASM is three times faster than PB.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Sun Jun 03, 2012 10:38 pm 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Helle wrote:
@chris319: You read 32-Bit, not 16-Bit (High-Word!). And: Why not this scheme without ReadData:
Code:
Global *FB=AllocateMemory(Lof(File)*2)
j=0
For i=0 To Lof(File)-2 Step 2
  PokeW(*FB+j,ReadWord(File))
  j+4
Next

Code from me without ASM :D ! Off Topic :lol: !
Helle

I presume you're talking about the FLAC encoder example program? You should bring that to the attention of the original author, oryaaaaa. All I did was make his program usable (as is it won't compile) and cleaned it up a little bit as noted in my post in that thread. Feel free to enhance it as you see fit. You can download the dll from http://sourceforge.net/projects/flac/fi ... 1.2.1-win/ You want the zip file named flac-1.2.1-devel-win.zip.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Mon Jun 04, 2012 12:06 am 
Offline
Addict
Addict
User avatar

Joined: Tue Nov 09, 2010 10:15 pm
Posts: 1719
chris319 wrote:
Thorium wrote:
Are you sure you have disabled the debugger on your performance test?

Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.

This is because debugger code is executed with each command, whether it is an asm command or a PB command.
Since hand-written asm requires more individual commands to get the work done than pb commands, that debugger code is executed more per given task.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Mon Jun 04, 2012 12:41 am 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Well, it's faster, but the 16- to 32-bit conversion isn't working the same as peek and poke. Examination of the ASM code reveals the external routines CALL PB_PeekW and CALL PB_PokeL.


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Mon Jun 04, 2012 4:07 am 
Offline
Enthusiast
Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
EUREKA!

The solution to my dilemma lies in CWDE. Works great now!
Quote:
cwde ; convert the signed word in ax to a double word in eax

Code:
FB = *FB
esi_temp.l ;STORAGE FOR NON-VOLATILE REGISTER
eax_temp.l
EnableASM
MOV esi_temp,ESI ;SAVE NON-VOLATILE REGISTER
mov eax_temp,eax ;SAVE NON-VOLATILE REGISTER

MOV ESI,FB ;LOAD ESI REGISTER WITH BUFFER ADDRESS
MOV ECX,subChunk2Size ;OFFSET
SUB ECX,2

loop: MOV AX,word[ESI+ECX] ;ADDRESS IS ALREADY IN ESI REGISTER, ECX HOLDS OFFSET
CWDE ;CONVERT 16 TO 32 BITS
MOV [ESI+ECX*2],EAX ;STORE IN MEMORY
SUB ECX,2
JGE l_loop

MOV ESI,esi_temp ;RESTORE NON-VOLATILE REGISTER
mov eax,eax_temp ;RESTORE NON-VOLATILE REGISTER
DisableASM


Top
 Profile  
Reply with quote  
 Post subject: Re: Moving Data: ASM vs PB
PostPosted: Mon Jun 04, 2012 5:38 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
This is how you could do it using SSE2 but I don't know if it is much faster
Code:
bytes_to_process = subChunk2Size

num_bytes = (bytes_to_process + 7) & -8; Make sure we always process a multiple of 8 bytes
*mem = AllocateMemory(num_bytes * 2 + 15); Allocate 15 bytes extra so we have room to use aligned memory
*FB = (*mem + 15) & -16; Aligned memory pointer

EnableASM
MOV edx, *FB
MOV ecx, num_bytes
DisableASM
!jmp c16_32entry
!c16_32loop:
!movq xmm0, [edx + ecx]
!punpcklwd xmm0, xmm0
!psrad xmm0, 16
!movdqa [edx + ecx * 2], xmm0
!c16_32entry:
!sub ecx, 8
!jnc c16_32loop


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye