For pos.l = subChunk2Size - 2 To 0 Step -2
PokeL(*FB + (pos << 1), PeekW(*FB + pos))
Next
I have tried to replicate it in ASM but it is slower and I don't think it works the same. Note that we are doing a 16- to 32-bit conversion. Below is my attempt. How would I replicate the PB in ASM?
If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.
wilbert wrote:If the goal of converting to ASM is to speed things up, the best approach depends on the value of subChunk2Size.
If you would always convert 2 or 4 words into 2 or 4 dwords, probably the fastest way is to use the PUNPCKLWD instruction.
I am with Wilbert here. If you are going to regularly be moving more than about a dozen var sizes of data (and in audio you would be), then move it in the largest "native" size possible--which is always an integer. Do your loop divided by the difference (i.e. 2 with 32-bit, and 4 with 64-bit if you are working with 16-bit words), and then do the remainder afterwards.
Thorium wrote:Are you sure you have disabled the debugger on your performance test?
Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.
Global *FB=AllocateMemory(Lof(File)*2)
j=0
For i=0 To Lof(File)-2 Step 2
PokeW(*FB+j,ReadWord(File))
j+4
Next
Code from me without ASM ! Off Topic !
Helle
I presume you're talking about the FLAC encoder example program? You should bring that to the attention of the original author, oryaaaaa. All I did was make his program usable (as is it won't compile) and cleaned it up a little bit as noted in my post in that thread. Feel free to enhance it as you see fit. You can download the dll from http://sourceforge.net/projects/flac/fi ... 1.2.1-win/ You want the zip file named flac-1.2.1-devel-win.zip.
Thorium wrote:Are you sure you have disabled the debugger on your performance test?
Ah yes, disabling the debugger made all the difference WRT speed. Now there are issues regarding the 16- to 32-bit conversion because the FLAC encoder isn't encoding properly.
This is because debugger code is executed with each command, whether it is an asm command or a PB command.
Since hand-written asm requires more individual commands to get the work done than pb commands, that debugger code is executed more per given task.
Well, it's faster, but the 16- to 32-bit conversion isn't working the same as peek and poke. Examination of the ASM code reveals the external routines CALL PB_PeekW and CALL PB_PokeL.