# PureBasic Forum

 It is currently Sun Jan 24, 2021 7:45 pm

 All times are UTC + 1 hour

 Page 1 of 1 [ 14 posts ]
 Print view Previous topic | Next topic
Author Message
 Post subject: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 7:05 pm
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Here is some PB code:

Code:
For pos.l = subChunk2Size - 2 To 0  Step -2
PokeL(*FB + (pos << 1), PeekW(*FB + pos))
Next

I have tried to replicate it in ASM but it is slower and I don't think it works the same. Note that we are doing a 16- to 32-bit conversion. Below is my attempt. How would I replicate the PB in ASM?

Code:
FB.l = *FB
EnableASM
MOV esi_temp,ESI ;SAVE NON-VOLATILE REGISTER

MOV ECX,subChunkSize2 ;OFFSET
SUB ECX,2

loop: MOV EDX,[ESI+ECX] ;ADDRESS IS ALREADY IN ESI REGISTER, CX HOLDS OFFSET
MOV [ESI+ECX*2],EDX
SUB ECX,2
JGE l_loop

MOV ESI,esi_temp ;RESTORE NON-VOLATILE REGISTER
DisableASM

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 7:31 pm
 Always Here

Joined: Fri Oct 23, 2009 2:33 am
Posts: 6282
Location: Wales, UK
Here is how: http://www.purebasic.fr/english/viewtopic.php?f=35&t=48298

Edit: Follow sRod's instructions, near the end of the page

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 7:42 pm
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 8:25 pm

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1260
It can't be slower. Are you sure you have disabled the debugger on your performance test?
And what exactly do you want to do?

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 9:08 pm
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Here's what I'm up to:

viewtopic.php?f=12&t=39830

FLAC is a lossless compression scheme for audio. This is unfortunately a necessary step for encoding to FLAC.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 9:53 pm

Joined: Tue Nov 09, 2010 10:15 pm
Posts: 1719
wilbert wrote:
If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.

I am with Wilbert here. If you are going to regularly be moving more than about a dozen var sizes of data (and in audio you would be), then move it in the largest "native" size possible--which is always an integer. Do your loop divided by the difference (i.e. 2 with 32-bit, and 4 with 64-bit if you are working with 16-bit words), and then do the remainder afterwards.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 10:15 pm
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Thorium wrote:
Are you sure you have disabled the debugger on your performance test?

Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 10:16 pm
 Enthusiast

Joined: Wed Apr 12, 2006 7:59 pm
Posts: 174
Location: Germany
@chris319: You read 32-Bit, not 16-Bit (High-Word!). And: Why not this scheme without ReadData:
Code:
Global *FB=AllocateMemory(Lof(File)*2)
j=0
For i=0 To Lof(File)-2 Step 2
j+4
Next

Code from me without ASM ! Off Topic !
Helle

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 10:27 pm
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Now for some stats:

PB For ... Next with debugger: 1560 ms

PB For ... Next without debugger: 94 ms

ASM with debugger: 1996 ms

ASM without debugger: 31 ms

Without the debugger, ASM is three times faster than PB.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Sun Jun 03, 2012 10:38 pm
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Helle wrote:
@chris319: You read 32-Bit, not 16-Bit (High-Word!). And: Why not this scheme without ReadData:
Code:
Global *FB=AllocateMemory(Lof(File)*2)
j=0
For i=0 To Lof(File)-2 Step 2
j+4
Next

Code from me without ASM ! Off Topic !
Helle

I presume you're talking about the FLAC encoder example program? You should bring that to the attention of the original author, oryaaaaa. All I did was make his program usable (as is it won't compile) and cleaned it up a little bit as noted in my post in that thread. Feel free to enhance it as you see fit. You can download the dll from http://sourceforge.net/projects/flac/fi ... 1.2.1-win/ You want the zip file named flac-1.2.1-devel-win.zip.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Mon Jun 04, 2012 12:06 am

Joined: Tue Nov 09, 2010 10:15 pm
Posts: 1719
chris319 wrote:
Thorium wrote:
Are you sure you have disabled the debugger on your performance test?

Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.

This is because debugger code is executed with each command, whether it is an asm command or a PB command.
Since hand-written asm requires more individual commands to get the work done than pb commands, that debugger code is executed more per given task.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Mon Jun 04, 2012 12:41 am
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
Well, it's faster, but the 16- to 32-bit conversion isn't working the same as peek and poke. Examination of the ASM code reveals the external routines CALL PB_PeekW and CALL PB_PokeL.

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Mon Jun 04, 2012 4:07 am
 Enthusiast

Joined: Mon Oct 24, 2005 1:05 pm
Posts: 745
EUREKA!

The solution to my dilemma lies in CWDE. Works great now!
Quote:
cwde ; convert the signed word in ax to a double word in eax

Code:
FB = *FB
esi_temp.l ;STORAGE FOR NON-VOLATILE REGISTER
eax_temp.l
EnableASM
MOV esi_temp,ESI ;SAVE NON-VOLATILE REGISTER
mov eax_temp,eax ;SAVE NON-VOLATILE REGISTER

MOV ECX,subChunk2Size ;OFFSET
SUB ECX,2

loop: MOV AX,word[ESI+ECX] ;ADDRESS IS ALREADY IN ESI REGISTER, ECX HOLDS OFFSET
CWDE ;CONVERT 16 TO 32 BITS
MOV [ESI+ECX*2],EAX ;STORE IN MEMORY
SUB ECX,2
JGE l_loop

MOV ESI,esi_temp ;RESTORE NON-VOLATILE REGISTER
mov eax,eax_temp ;RESTORE NON-VOLATILE REGISTER
DisableASM

Top

 Post subject: Re: Moving Data: ASM vs PBPosted: Mon Jun 04, 2012 5:38 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
This is how you could do it using SSE2 but I don't know if it is much faster
Code:
bytes_to_process = subChunk2Size

num_bytes = (bytes_to_process + 7) & -8; Make sure we always process a multiple of 8 bytes
*mem = AllocateMemory(num_bytes * 2 + 15); Allocate 15 bytes extra so we have room to use aligned memory
*FB = (*mem + 15) & -16; Aligned memory pointer

EnableASM
MOV edx, *FB
MOV ecx, num_bytes
DisableASM
!jmp c16_32entry
!c16_32loop:
!movq xmm0, [edx + ecx]
!punpcklwd xmm0, xmm0
!movdqa [edx + ecx * 2], xmm0
!c16_32entry:
!sub ecx, 8
!jnc c16_32loop

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 14 posts ]

 All times are UTC + 1 hour

#### Who is online

Users browsing this forum: No registered users and 1 guest

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forum

Search for:
 Jump to:  Select a forum ------------------ PureBasic    Coding Questions    Game Programming    3D Programming    Assembly Programming    The PureBasic Editor    The PureBasic Form Designer    General Discussion    Feature Requests and Wishlists    Tricks 'n' Tips Bug Reports    Bugs - Windows    Bugs - Linux    Bugs - Mac OSX    Bugs - IDE    Bugs - Documentation OS Specific    AmigaOS    Linux    Windows    Mac OSX Miscellaneous    Announcement    Off Topic Showcase    Applications - Feedback and Discussion    PureFORM & JaPBe    TailBite