R Hyde's Art of Assembly - best version?

Bare metal programming in PureBasic, for experienced users
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8433
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

R Hyde's Art of Assembly - best version?

Post by netmaestro »

IdeasVacuum posted this link (thanks!): http://www.arl.wustl.edu/~lockwood/clas ... m/toc.html
This book seems to be a work of Randall Hyde, who published an updated version in 2001 and a 2nd edition in 2009 (maybe 10, not sure). Is it just me or is this 1996 work fifty times better than the two versions that succeeded it? I mean, HLA isn't something I (and probably most asm enthusiasts) are interested in. This one is chockfull of pure gold.
BERESHEIT
Zach
Addict
Addict
Posts: 1656
Joined: Sun Dec 12, 2010 12:36 am
Location: Somewhere in the midwest
Contact:

Re: R Hyde's Art of Assembly - best version?

Post by Zach »

I actually thought HLA was a pretty neat concept.
Image
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Re: R Hyde's Art of Assembly - best version?

Post by oldefoxx »

There was a time when it was almost a rage to come up with a book on Assembly programming. As a consequence, there
was quite a few good ones out there. With the architecture becoming more cumbersome to work with due to all manner
of enhancements, and the Operating System dominating what you can do if you do get into Assembler programming, it is
less of an attraction now, and possibly too much time is spent on aspects of Assembly Programming that very few will
strive to learn or master.

You look for a book that covers Assembly programming, you find very little that ventures beyond the 386 or 486. Maybe
a Pentium processor or two, but that is about it. You are even then decades behind the architecture that resides in a
modern PC, but as long as they design and build them to carry out the 386/486 instruction sets, you can get by.

Meaning that sometimes it is the older books that get to the heart of the matter in a better fashion, which can be great
if you are open to the idea of picking up a used book here or there. Something from the late 80's or 90's might be best.
I have a good book that I priced again online in order to recommend it to others, and I'm talking a college-level textbook
here, and I found it priced as low as a penny in Very Good condition, and as little as $0.55 at Barnes and Nobles. You
still had to pay nearly $4 in shipping, but I think that was still a bargain. Amazon.com is a good place to shop for
something like this, and be sure to check out some of the postings there that involve used books.

That does not mean you can't learn assembly online through the many links and sites out there, but face it, a book is
going to be more structured, more flow oriented, easier to get around in, and you can mark it up if you see fit. You
might even be challenged to try out some of the example programs in the book to see if they really work.
has-been wanna-be (You may not agree with what I say, but it will make you think).
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: R Hyde's Art of Assembly - best version?

Post by Thorium »

oldefoxx wrote: You look for a book that covers Assembly programming, you find very little that ventures beyond the 386 or 486. Maybe
a Pentium processor or two, but that is about it. You are even then decades behind the architecture that resides in a
modern PC, but as long as they design and build them to carry out the 386/486 instruction sets, you can get by.
Thats because the basic architecture did not changed.
This books tell you all you need to know to lern coding in asm. If you want the fancy stuff, like SIMD you just read the official manuals by Intel and/or AMD. They are very good with a lot of examples and explanations. Personaly i dont need any other book than the Intel manuals.
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Re: R Hyde's Art of Assembly - best version?

Post by oldefoxx »

The basic architecture has changed remarkably as to added capabilities, but the core architecture is still
designed with the original concepts in mind. Most who write asm code probably are focused on the original
set of registers, and some basics are written with that in mind, so you might need to steer clear of the original
registerss and make use instead of the increased number and scope of registers that have become available.
Trouble is, the new registers are so undocumented that you don't know if they are in use or not, or really what
your options are for using them. That means most inline asm code falls back on the original registers anyway.

HotBasic, for instance, lays claim to the fact that it uses none of the original set of registers, so they are al
open to the user. PowerBasic used to claim that the only register you had to be sure to restore was the ebx
register. PureBasic, on the other hand, advises you that you only have use of eax, ecx, and edx. I presume
that means they also need esi and edi. You have pushad and popad to save the registers on the stack and pull them back off, but the problem with that is that you distort the stack pointer to the local variables, which apparently are made part of the stack so that the procedures can all be made recursive, meaning a function
or sub can call itself repeatedly until some condition is met. Works well for some sort processes.

But, and this is my dubious observation right now, if you try to intermix PureBasic compiler code with PureBasic
assembler code, and are trying to make use of more than the eax, ecx, and edx registers, are you going to have
do a pushad before another asm instruction is carried out, then a popad before another compiler instruction is
carried out? It almost looks like you would, and that is bad, because you then wipe out the contents of those
registers which you were trying to make use of. Makes you wish there was a swapad instruction in the X86
opcodes and menmonics so that two separate processes could switch the use of the registers back and forth.
has-been wanna-be (You may not agree with what I say, but it will make you think).
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: R Hyde's Art of Assembly - best version?

Post by Thorium »

oldefoxx wrote:Most who write asm code probably are focused on the original
set of registers, and some basics are written with that in mind, so you might need to steer clear of the original
registerss and make use instead of the increased number and scope of registers that have become available.
Trouble is, the new registers are so undocumented that you don't know if they are in use or not, or really what
your options are for using them. That means most inline asm code falls back on the original registers anyway.
Undocumented registers are CPU model specific and should not be used anyway. All other registers are well documented in the Intel manuals. The registers you have to preserve are specified in the ABI of the given OS. See MSDN for Windows.

I advice to not use PB asm at all. Put a exclamation mark in front of your ASM lines. That tells PB to not process them and pass them to the assembler as they are. Local variables are stack pointers and can be accessed by adding the number of bytes you pushed on the stack.
Like this:

Code: Select all

Procedure Test(Test.i)
  
  !push esi
  !mov eax, [p.v_Test+4]
  
EndProcedure
I also advice not to mix ASM with PB code. Put your ASM code in procedures, do some initialization code in PB at the start of the procedure and you can put some PB code at the end but try to not mix them together. Then there should be no problem using all registers including mmx, xmm and even ymm registers.
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Re: R Hyde's Art of Assembly - best version?

Post by oldefoxx »

I appreciate the advice. I was perceiving a real problem there. Your recommendations seem to confirm this.
I generally don't go back and forth between asm and basic anyway, and there is less need to with PureBasic
because of its ability to do shifts in basic as well as its use of binary And, Or, XOr, and Not. There may or may
not be a gain in execution speed or size compression by working in some assembler code, but it is almost moot
anyway. Why? Because if you are doing a long stretch of processing, you need to return time slices to the CPU
so that other processes have a chance to do their thing and you don't lock up the processor where nothing else
is able to get time to work.

Inline Assembler is not the only way to fly. It can be a help, but that depends on what you can or cannot do
as far as the higher level language is concerned. Before processing a string with unknown content, I often prep
it this way:

Code: Select all

Global aa.s
aa=InputRequester("What Now?","Keep it simple","Search, List, Find, etc.")
aa=LCase(Trim(ReplaceString(aa,Chr(9)," ")))
Debug aa
And this is not unique to me. The ReplaceString() takes any tabs (Chr(9)) and plugs in a space (" ") in its place,
so that Trim() can remove any spaces before or after the key characters of your response, and the lcase() takes
what is left and makes it all lower case so that I don't have to worry about what case it was typed in. Now you
write a routine in either basic or asm to do the same thing on a character by character bases, and it might
involve more instructions, but it could reduce what is three passes through aa down to one pass, which could be
somewhat faster. The difference in speed would be more evident if aa was extremely long, filled with all sorts
of content from, oh, possibly a large file of some sort. Were that the case, then the Trim() effort would not be
very effective as it would only effect the very tips of the beginning and end of the string. For something of that
nature, you would have to process the input either line by line or character by character.
has-been wanna-be (You may not agree with what I say, but it will make you think).
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: R Hyde's Art of Assembly - best version?

Post by Thorium »

oldefoxx wrote: Inline Assembler is not the only way to fly. It can be a help, but that depends on what you can or cannot do
as far as the higher level language is concerned. Before processing a string with unknown content, I often prep
it this way:

Code: Select all

Global aa.s
aa=InputRequester("What Now?","Keep it simple","Search, List, Find, etc.")
aa=LCase(Trim(ReplaceString(aa,Chr(9)," ")))
Debug aa
You can also do it in PB in a single pass. You can access the string as a char array with pointers.

If strings are realy important for you, take a look at the SSE4.2 instruction extension. It introduced some instructions for string operations: http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Re: R Hyde's Art of Assembly - best version?

Post by oldefoxx »

That's good to know. I bookmarked the link. It was stated earlier that not all processors have the same set of
registers. That's true, but what you do have to consider is that even the 386 is largely a part of history now,
and that the 486 is bottom end, though it is probably gone as well. So if the register shows up in the 386/486
set of registers, it's moved on to become part of the Pentiums, the 586, the 686, and later versions as well.
If you make use of these registers, your code should run on virtually any PC that has a CPU from the X86
family in it. This includes the AMD clones produced, although at some points their chip sets do differ from
Intels, but not in a bad way. Just different follow-on features than Intel adopted. What you may have to
avoid is additional registers that came later, unless your program is only intended to be run on a specific
machine or style CPU, in which case you can push the limits as hard as you want.

But where to start. Well, start online might be a choice: R. Hyde's Art of Assembly is available online, and you
can go to this link and look at section 4.1.5 to learn something about the registers:
http://cs.smith.edu/~thiebaut/ArtOfAsse ... EADING1-42

Again though, as already indicated, only three of these registers are open to the user to use without first
preserving the contents and restoring them later. This chapter also covers memory addressing. You want a
handy reference to the instruction set, consider this: http://zsmith.co/intel.html
has-been wanna-be (You may not agree with what I say, but it will make you think).
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: R Hyde's Art of Assembly - best version?

Post by Thorium »

oldefoxx wrote: If you make use of these registers, your code should run on virtually any PC that has a CPU from the X86
family in it. This includes the AMD clones produced, although at some points their chip sets do differ from
Intels, but not in a bad way. Just different follow-on features than Intel adopted. What you may have to
avoid is additional registers that came later, unless your program is only intended to be run on a specific
machine or style CPU, in which case you can push the limits as hard as you want.
The SIMD registers: mm, xmm and ymm can be used if they are present. They are part of the instruction set extensions. You can check CPUID for what extensions are present on the CPU and then select the best available instruction set extension. If you dont do that you are missing the true power of the CPU, which is SIMD and it's wide registers up to 256 bit.

An example of one of my projects:
It filters image data for compression and uses the best register set available. This code will run on 386 but will use newer registers if they are available.

It has implementations for 80386 asm, MMX, SSE2 and PB (for none x86 CPU's) all of them implemented in 32bit and 64bit.
Only downside: It's a lot of code for all the implementations for just a simple small procedure.

Code: Select all

Global Tsi_MmxSupported.i
Global Tsi_Sse2Supported.i

Procedure.i Tsi_IsCpuidSupported()

  !pushfd
  !pop eax
  !mov edx,eax
  !xor eax,$00200000
  !push eax
  !popfd
  !pushfd
  !pop eax
  !xor eax,edx
  !jne Tsi_IsCpuidSupported_Supported
  !xor eax,eax
  
  ProcedureReturn
  
  !Tsi_IsCpuidSupported_Supported:
  !mov eax,1
  
  ProcedureReturn

EndProcedure

Procedure.i Tsi_IsMmxSupported()

  CompilerSelect #PB_Compiler_Processor

    CompilerCase #PB_Processor_x86

      !mov eax,1
      !push ebx
      !cpuid
      !pop ebx
      !test edx,$00800000
      !jne Tsi_IsMmxSupported_Supported
      !xor eax,eax
      
      ProcedureReturn
    
      !Tsi_IsMmxSupported_Supported:
      !mov eax,1
      
      ProcedureReturn
    
    CompilerCase #PB_Processor_x64

      !mov rax,1
      !push rbx
      !cpuid
      !pop rbx
      !test edx,$00800000
      !jne Tsi_IsMmxSupported_Supported
      !xor rax,rax
      
      ProcedureReturn
    
      !Tsi_IsMmxSupported_Supported:
      !mov rax,1
      
      ProcedureReturn
    
  CompilerEndSelect

EndProcedure

Procedure.i Tsi_IsSse2Supported()

  CompilerSelect #PB_Compiler_Processor

    CompilerCase #PB_Processor_x86

      !mov eax,1
      !push ebx
      !cpuid
      !pop ebx
      !test edx,$04000000
      !jne Tsi_IsSse2Supported_Supported
      !xor eax,eax
      
      ProcedureReturn
    
      !Tsi_IsSse2Supported_Supported:
      !mov eax,1
      
      ProcedureReturn
    
    CompilerCase #PB_Processor_x64

      !mov rax,1
      !push rbx
      !cpuid
      !pop rbx
      !test edx,$04000000
      !jne Tsi_IsSse2Supported_Supported
      !xor rax,rax
      
      ProcedureReturn
    
      !Tsi_IsSse2Supported_Supported:
      !mov rax,1
      
      ProcedureReturn

  CompilerEndSelect

EndProcedure

Procedure Tsi_UnFilterUp(*ImageData, Width.i, Height.i, PixelSize.i)

  CompilerSelect #PB_Compiler_Processor
  
    CompilerCase #PB_Processor_x86

      If Tsi_Sse2Supported = #True

        ;save registers
        !push esi
        !push edi
        !push ebx

        ;calculate the pointers
        !mov edi,[p.p_ImageData+12]
        !mov esi,edi
        !mov eax,[p.v_Width+12]
        !mul dword[p.v_PixelSize+12]
        !mov edx,eax
        !add edi,edx

        ;calculate the counters
        !mov eax,[p.v_Height+12]
        !dec eax
        !mul dword[p.v_Width+12]
        !mul dword[p.v_PixelSize+12]
        !mov ecx,eax
        !shr ecx,7
        !and eax,127
        !mov ebx,eax
        
        ;process a part of the data to cut the length to a multiple of 128
        !test ebx,ebx
        !je Tsi_UnFilterUp_Sse2CutLengthEnd
        
        !align 4
        !Tsi_UnFilterUp_Sse2CutLengthStart:

          !mov al,[edi]
          !add al,[esi]
          !mov [edi],al
          
          !inc esi
          !inc edi
      
        !dec ebx
        !jne Tsi_UnFilterUp_Sse2CutLengthStart

        !align 4
        !Tsi_UnFilterUp_Sse2CutLengthEnd:
        
        ;process the rest of the data
        !test ecx,ecx
        !je Tsi_UnFilterUp_Sse2LoopEnd
        
        !align 4
        !Tsi_UnFilterUp_Sse2LoopStart:

          !movdqu xmm0,[esi]
          !movdqu xmm1,[esi+16]
          !movdqu xmm2,[esi+32]
          !movdqu xmm3,[esi+48]
          !movdqu xmm4,[esi+64]
          !movdqu xmm5,[esi+80]
          !movdqu xmm6,[esi+96]
          !movdqu xmm7,[esi+112]
          
          !paddb xmm0,[edi]
          !paddb xmm1,[edi+16]
          !paddb xmm2,[edi+32]
          !paddb xmm3,[edi+48]
          !paddb xmm4,[edi+64]
          !paddb xmm5,[edi+80]
          !paddb xmm6,[edi+96]
          !paddb xmm7,[edi+112]

          !movdqu [edi],xmm0
          !movdqu [edi+16],xmm1
          !movdqu [edi+32],xmm2
          !movdqu [edi+48],xmm3
          !movdqu [edi+64],xmm4
          !movdqu [edi+80],xmm5
          !movdqu [edi+96],xmm6
          !movdqu [edi+112],xmm7

          !add esi,128
          !add edi,128
        
        !dec ecx
        !jne Tsi_UnFilterUp_Sse2LoopStart
        
        !align 4
        !Tsi_UnFilterUp_Sse2LoopEnd:

        ;restore the registers
        !pop ebx
        !pop edi
        !pop esi

        ;end SSE2 state
        !emms

      ElseIf Tsi_MmxSupported = #True

        ;save registers
        !push esi
        !push edi
        !push ebx

        ;calculate the pointers
        !mov edi,[p.p_ImageData+12]
        !mov esi,edi
        !mov eax,[p.v_Width+12]
        !mul dword[p.v_PixelSize+12]
        !mov edx,eax
        !add edi,edx

        ;calculate the counters
        !mov eax,[p.v_Height+12]
        !dec eax
        !mul dword[p.v_Width+12]
        !mul dword[p.v_PixelSize+12]
        !mov ecx,eax
        !shr ecx,6
        !and eax,63
        !mov ebx,eax

        ;process a part of the data to cut the length to a multiple of 64
        !test ebx,ebx
        !je Tsi_UnFilterUp_MmxCutLengthEnd
        
        !align 4
        !Tsi_UnFilterUp_MmxCutLengthStart:
      
          !mov al,[edi]
          !add al,[esi]
          !mov [edi],al
          
          !inc esi
          !inc edi
      
        !dec ebx
        !jne Tsi_UnFilterUp_MmxCutLengthStart

        !align 4
        !Tsi_UnFilterUp_MmxCutLengthEnd:

        ;process the rest of the data
        !test ecx,ecx
        !je Tsi_UnFilterUp_MmxLoopEnd

        !align 4
        !Tsi_UnFilterUp_MmxLoopStart:

          !movq mm0,[esi]
          !movq mm1,[esi+8]
          !movq mm2,[esi+16]
          !movq mm3,[esi+24]
          !movq mm4,[esi+32]
          !movq mm5,[esi+40]
          !movq mm6,[esi+48]
          !movq mm7,[esi+56]

          !paddb mm0,[edi]
          !paddb mm1,[edi+8]
          !paddb mm2,[edi+16]
          !paddb mm3,[edi+24]
          !paddb mm4,[edi+32]
          !paddb mm5,[edi+40]
          !paddb mm6,[edi+48]
          !paddb mm7,[edi+56]

          !movq [edi],mm0
          !movq [edi+8],mm1
          !movq [edi+16],mm2
          !movq [edi+24],mm3
          !movq [edi+32],mm4
          !movq [edi+40],mm5
          !movq [edi+48],mm6
          !movq [edi+56],mm7
          
          !add esi,64
          !add edi,64

        !dec ecx
        !jne Tsi_UnFilterUp_MmxLoopStart

        !align 4
        !Tsi_UnFilterUp_MmxLoopEnd:

        ;restore the registers
        !pop ebx
        !pop edi
        !pop esi

        ;end MMX state
        !emms
      
      Else
      
        !push esi
        !push edi
  
        !mov eax,[p.v_Height+8]
        !dec eax
        !mul dword[p.v_Width+8]
        !mul dword[p.v_PixelSize+8]
        !mov ecx,eax
        
        !mov edi,[p.p_ImageData+8]
        !mov esi,edi
        
        !mov eax,[p.v_Width+8]
        !mul dword[p.v_PixelSize+8]
        !mov edx,eax
        !add edi,edx
        
        !align 4
        !Tsi_UnFilterUp_LoopStart:
        
          !mov al,[edi]
          !add al,[esi]
          !mov [edi],al
          
          !inc esi
          !inc edi
          
        !dec ecx
        !jne Tsi_UnFilterUp_LoopStart
      
        !pop edi
        !pop esi
      
      EndIf
    
    CompilerCase #PB_Processor_x64

      If Tsi_Sse2Supported = #True

        ;save registers
        !push rsi
        !push rdi

        ;calculate the pointers
        !mov rdi,[p.p_ImageData+16]
        !mov rsi,rdi
        !mov rax,[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]
        !mov rdx,rax
        !add rdi,rdx

        ;calculate the counters
        !mov rax,[p.v_Height+16]
        !dec rax
        !mul qword[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]
        !mov rcx,rax
        !shr rcx,7
        !and rax,127
        !mov r10,rax

        ;process a part of the data to cut the length to a multiple of 128
        !test r10,r10
        !je Tsi_UnFilterUp_Sse2CutLengthEnd
        
        !align 8
        !Tsi_UnFilterUp_Sse2CutLengthStart:
      
          !mov al,[rdi]
          !add al,[rsi]
          !mov [rdi],al
          
          !inc rsi
          !inc rdi
      
        !dec r10
        !jne Tsi_UnFilterUp_Sse2CutLengthStart

        !align 8
        !Tsi_UnFilterUp_Sse2CutLengthEnd:
        
        ;process the rest of the data
        !test rcx,rcx
        !je Tsi_UnFilterUp_Sse2LoopEnd
        
        !align 8
        !Tsi_UnFilterUp_Sse2LoopStart:

          !movdqu xmm0,[rsi]
          !movdqu xmm1,[rsi+16]
          !movdqu xmm2,[rsi+32]
          !movdqu xmm3,[rsi+48]
          !movdqu xmm4,[rsi+64]
          !movdqu xmm5,[rsi+80]
          !movdqu xmm6,[rsi+96]
          !movdqu xmm7,[rsi+112]
          
          !paddb xmm0,[rdi]
          !paddb xmm1,[rdi+16]
          !paddb xmm2,[rdi+32]
          !paddb xmm3,[rdi+48]
          !paddb xmm4,[rdi+64]
          !paddb xmm5,[rdi+80]
          !paddb xmm6,[rdi+96]
          !paddb xmm7,[rdi+112]

          !movdqu [rdi],xmm0
          !movdqu [rdi+16],xmm1
          !movdqu [rdi+32],xmm2
          !movdqu [rdi+48],xmm3
          !movdqu [rdi+64],xmm4
          !movdqu [rdi+80],xmm5
          !movdqu [rdi+96],xmm6
          !movdqu [rdi+112],xmm7

          !add rsi,128
          !add rdi,128
        
        !dec rcx
        !jne Tsi_UnFilterUp_Sse2LoopStart
        
        !align 8
        !Tsi_UnFilterUp_Sse2LoopEnd:

        ;restore the registers
        !pop rdi
        !pop rsi

        ;end SSE2 state
        !emms

      ElseIf Tsi_MmxSupported = #True

        ;save registers
        !push rsi
        !push rdi

        ;calculate the pointers
        !mov rdi,[p.p_ImageData+16]
        !mov rsi,rdi
        !mov rax,[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]
        !mov rdx,rax
        !add rdi,rdx

        ;calculate the counters
        !mov rax,[p.v_Height+16]
        !dec rax
        !mul qword[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]
        !mov rcx,rax
        !shr rcx,6
        !and rax,63
        !mov r10,rax

        ;process a part of the data to cut the length to a multiple of 64
        !test r10,r10
        !je Tsi_UnFilterUp_MmxCutLengthEnd
        
        !align 8
        !Tsi_UnFilterUp_MmxCutLengthStart:
      
          !mov al,[rdi]
          !add al,[rsi]
          !mov [rdi],al
          
          !inc rsi
          !inc rdi
      
        !dec r10
        !jne Tsi_UnFilterUp_MmxCutLengthStart

        !align 8
        !Tsi_UnFilterUp_MmxCutLengthEnd:

        ;process the rest of the data
        !test rcx,rcx
        !je Tsi_UnFilterUp_MmxLoopEnd

        !align 8
        !Tsi_UnFilterUp_MmxLoopStart:

          !movq mm0,[rsi]
          !movq mm1,[rsi+8]
          !movq mm2,[rsi+16]
          !movq mm3,[rsi+24]
          !movq mm4,[rsi+32]
          !movq mm5,[rsi+40]
          !movq mm6,[rsi+48]
          !movq mm7,[rsi+56]

          !paddb mm0,[rdi]
          !paddb mm1,[rdi+8]
          !paddb mm2,[rdi+16]
          !paddb mm3,[rdi+24]
          !paddb mm4,[rdi+32]
          !paddb mm5,[rdi+40]
          !paddb mm6,[rdi+48]
          !paddb mm7,[rdi+56]

          !movq [rdi],mm0
          !movq [rdi+8],mm1
          !movq [rdi+16],mm2
          !movq [rdi+24],mm3
          !movq [rdi+32],mm4
          !movq [rdi+40],mm5
          !movq [rdi+48],mm6
          !movq [rdi+56],mm7
          
          !add rsi,64
          !add rdi,64

        !dec rcx
        !jne Tsi_UnFilterUp_MmxLoopStart

        !align 8
        !Tsi_UnFilterUp_MmxLoopEnd:

        ;restore the registers
        !pop rdi
        !pop rsi

        ;end MMX state
        !emms

      Else
      
        !push rsi
        !push rdi

        !mov rax,[p.v_Height+16]
        !dec rax
        !mul qword[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]        
        !mov rcx,rax
        
        !mov rdi,[p.p_ImageData+16]
        !mov rsi,rdi
        !mov rax,[p.v_Width+16]
        !mul qword[p.v_PixelSize+16]
        !mov rdx,rax        
        !add rdi,rdx
        
        !align 8
        !Tsi_UnFilterUp_LoopStart:
        
          !mov al,[rdi]
          !add al,[rsi]
          !mov [rdi],al
          
          !inc rsi
          !inc rdi
          
        !dec rcx
        !jne Tsi_UnFilterUp_LoopStart
      
        !pop rdi
        !pop rsi
      
      EndIf

    CompilerDefault
    
      Protected.i X, ByteSize
      Protected *ActualChannel.Tsi_Pixel_Channel
      Protected *PriorChannel.Tsi_Pixel_Channel
      
      *PriorChannel  = *ImageData
      *ActualChannel = *ImageData + Width * PixelSize
      
      Height - 1
      ByteSize = Width * Height * PixelSize
      
      For X = 1 To ByteSize
        
        *ActualChannel\Channel = *ActualChannel\Channel + *PriorChannel\Channel
        *ActualChannel + 1
        *PriorChannel + 1
    
      Next

  CompilerEndSelect

EndProcedure
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Re: R Hyde's Art of Assembly - best version?

Post by oldefoxx »

Looking at your code, it doesn't reveal much. Seems simple enough, but can't figure out what doing a push of
eax, then a popfd, then a push of fd, and a pop of eax really accomplished. Or what having what was in eax
does by moving it to edx. What I can make out of it is nowhere near as convoluted as I would have expected.
But then I'm fairly ignorant of what you can do with any of the advanced features in the X86 architecture.

Oh, wait! I just realized that the page is scrollable. I realized that I must be missing something when I did not
see an EndProcedure. I will have to study the whole a bit to see if I can make out what is going on. Still don't
understand the above though.

What would be a big help would be some literature on the advance features of the X86, something that even
goes so far as to include some working examples.

It's not just extensions, though. You have to deal with available Assembler mnemonics, and which Assembler
to choose can have unexpected consequences. I'm going to show you a screen from one of the little utilities I
wrote a few weeks back, and let's see what you think of it:

Code: Select all

      "Enter 80x86 Mnemonics and observe the Hex opcodes that result"

 Enter an 80x86 instruction mnemonic to convert: add eax,1
ml.exe says "add eax,1" translates to " 83 C0 01 "
nasm.exe says "add eax,1" translates to " 66 05 01 00 00 00 "
fasm.exe says "add eax,1" translates to " 66 83 C0 01 "
.
 Enter an 80x86 instruction mnemonic to convert: add ax,1
ml.exe says "add ax,1" translates to " 66 83 C0 01 "
nasm.exe says "add ax,1" translates to " 05 01 00 "
fasm.exe says "add ax,1" translates to " 83 C0 01 ")

 Enter an 80x86 instruction mnemonic to convert:

ml.exe is tha assembler for Masm32. Nasm.exe and Fasm.exe are two other assemblers. As you can see, for a
number of instructions, they don't agree on what opcodes to use, which are the hex opcodes between the double
quotes after "translates to". So who is right? They can't all be right, because in the two examples above, the
ml.exe and fasm.exe give exactly opposite results between the two, and the difference is whether we are using
eax or ax as the destination.
Last edited by oldefoxx on Sun Sep 02, 2012 7:18 am, edited 1 time in total.
has-been wanna-be (You may not agree with what I say, but it will make you think).
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: R Hyde's Art of Assembly - best version?

Post by wilbert »

Thorium wrote:It has implementations for 80386 asm, MMX, SSE2 and PB (for none x86 CPU's) all of them implemented in 32bit and 64bit.
Only downside: It's a lot of code for all the implementations for just a simple small procedure.
That are a lot of implementations :)
On OS X it's a little less complicated. Apple switched to Intel using Core Solo.
Of course it's good to test things but it's also nice to know every Intel based Mac has SSE/SSE2/SSE3 support.
oldefoxx wrote:They can't all be right, because in the two examples above, the
ml.exe and fasm.exe give exactly opposite results between the two
Why can't they both be right ?
There can be multiple opcodes that have the same result.
They also depend on the context; was the variable eax represents a signed or unsigned value, was the source compiled for x86 or x64 etc.
User avatar
Lord
Addict
Addict
Posts: 849
Joined: Tue May 26, 2009 2:11 pm

Re: R Hyde's Art of Assembly - best version?

Post by Lord »

Hi Thorium!

I get an Assembler error if I run your code on PB4.61 x64,
on PB4.61 x86 it works fine.
I just make a

Code: Select all

Debug Tsi_IsCpuidSupported()
Debug Tsi_IsMmxSupported()
Debug Tsi_IsSse2Supported()

result (x86):
1
1
1
Result (x64):
---------------------------
PureBasic - Assembler error
---------------------------
PureBasic.asm [679]:

MP0

PureBasic.asm [96] MP0 [9]:

pushfd

error: illegal instruction.


---------------------------
OK
---------------------------
What happens here?
Image
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: R Hyde's Art of Assembly - best version?

Post by Thorium »

oldefoxx wrote:Looking at your code, it doesn't reveal much. Seems simple enough, but can't figure out what doing a push of
eax, then a popfd, then a push of fd, and a pop of eax really accomplished. Or what having what was in eax
does by moving it to edx. What I can make out of it is nowhere near as convoluted as I would have expected.
But then I'm fairly ignorant of what you can do with any of the advanced features in the X86 architecture.
It's setting up the parameters for the CPUID instruction. With the CPUID instruction you can get a lot of information about the CPU your code runs on. You can get so many informations that Intel did a whole manual only covering the CPUID instruction.
The first code just checks if the CPUID instruction is supported. This is the official way to check it as described in the Intel CPUID manual.

For the manuals: http://www.intel.com/content/www/us/en/ ... uals.html/
I think they are realy good. And they actualy contain example codes.
Lord wrote: What happens here?
The IsCpuidSupported() procedure is only 32 bit code and does not run or compile for 64 bit.
Thats because on 64 bit you dont need to check if CPUID is supported. All 64 bit CPU's do support it.
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Re: R Hyde's Art of Assembly - best version?

Post by Helle »

@oldefoxx:
FASM 1.70.03, 32/64Bit --> ADD EAX,1 --> 83 C0 01 and ADD AX,1 --> 66 83 C0 01
NASM 2.10.01, 32/64Bit --> ADD EAX,1 --> 83 C0 01 and ADD AX,1 --> 66 83 C0 01
:?:
Post Reply