R Hyde's Art of Assembly - best version?

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 3:26 am

My results, as extracted from actual assembled files uaing each assembler separately:
==============================================================================
Enter an 80x86 instruction mnemonic to convert: add eax,1
ml.exe says "add eax,1" translates to " 83 C0 01 "
nasm.exe says "add eax,1" translates to " 66 05 01 00 00 00 "
fasm.exe says "add eax,1" translates to " 66 83 C0 01 "
.
Enter an 80x86 instruction mnemonic to convert: add ax,1
ml.exe says "add ax,1" translates to " 66 83 C0 01 "
nasm.exe says "add ax,1" translates to " 05 01 00 "
fasm.exe says "add ax,1" translates to " 83 C0 01 ")
==============================================================================
Your results, by whatever your technique is:
==============================================================================
FASM 1.70.03, 32/64Bit --> ADD EAX,1 --> 83 C0 01 and ADD AX,1 --> 66 83 C0 01
NASM 2.10.01, 32/64Bit --> ADD EAX,1 --> 83 C0 01 and ADD AX,1 --> 66 83 C0 01
==============================================================================
What my technique was is to shorten the assembly process with a switch, so that I
do not end up with an .OBJ file true, just a binary file. I repeat the test 8 times,
and between each add eax, 1 or each add ax,1 instruction I insert what I consider a
pretty viable pair of NOPs of my own choosing, so as to eliminate likely mis-brackets
of the assembler results. I then ensure that I get 8 recoveries of the same results.
The NOPs I choose to use, to bypass possible matches outside in the surrounding code,
is a mov cl,cl and mov ch,ch instruction pair. So the code I send to the assembler
looks like:

mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch
add eax,1
mov cl,cl
mov ch,ch

or, if I am doing the add ax,1, I send it this instead:

mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch
add ax,1
mov cl,cl
mov ch,ch

With ml.exe, I have to provided this as well:
.486
.model flat, stdcall
option casemap :none
.code
start:

then at the end I have to provide
end start

I call ml.exe with a /c
I call nasm.exe with a -a
nothing fancy required for fasm.exe

Then I process the created file, break it down by the translation for
mov cl,cl and mov ch,ch, which is not even the same each time,
sometimes coming out with 8A and sometimes with 88 as the lead
hex, and then try to make sense of what I get. No, I can't explain
the differences between the assembler results and what you get.
I can't even explain the differences in what I get.

wilbert · Post by **wilbert** » Mon Sep 03, 2012 6:33 am

Does this help ?
http://siyobik.info.gf/main/reference/instruction/ADD

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 7:06 am

Are you kidding? Did you even look at it up close?

ADD

Add
Opcodes
Hex Mnemonic Encoding Long Mode Legacy Mode Description
04 ib ADD AL, imm8 C Valid Valid Add imm8 to AL.
05 iw ADD AX, imm16 C Valid Valid Add imm16 to AX.
05 id ADD EAX, imm32 C Valid Valid Add imm32 to EAX.

The opcodes shown for ADD AX and ADD EAX are exactly the same. The trailing iw and id don't represent opcodes, just show that by
some mystical force one 05 represents a word (16 bits) and the other 05 represents a dword (32-bits). The ib, iw, and id are not
representative of any specific opcode, and can't be taken as such. All programming manuals are like this, because they all draw upon
the original documentation or are copied from each other. The correct answer is that one should be 66 05 and the other 05, but there
is nothing to tell you specifically which is what, and we have three assemblers that don't agree on the matter.

wilbert · Post by **wilbert** » Mon Sep 03, 2012 8:07 am

66 is an operand-size prefix. It's not specific to the ADD instruction

When the operand-size prefix is used, then the non-default register set is used. For example, if the intruction is executed in a 16-bit segment, but has the operand-size prefix in the opcode, then the 32-bit registers are used. Inversly, if the intruction is executed in a 32-bit segment, but has the operand-size prefix in the opcode, then the 16-bit registers are used.

jack · Post by **jack** » Mon Sep 03, 2012 2:24 pm

oldefoxx wrote:My results, as extracted from actual assembled files uaing each assembler separately:

Enter an 80x86 instruction mnemonic to convert: add eax,1
ml.exe says "add eax,1" translates to " 83 C0 01 "
nasm.exe says "add eax,1" translates to " 66 05 01 00 00 00 "
fasm.exe says "add eax,1" translates to " 66 83 C0 01 "

I tried the opcodes above and the first and last set works but 66 05 01 00 00 00 causes a crash, are you sure about that one?

Thorium · Post by **Thorium** » Mon Sep 03, 2012 3:43 pm

Dude, just look up the Intel manuals. Instruction encoding is well documented there. It isnt just opcodes, you also have prefixes and stuff. It's actually quite complicated.

Different assemblers may generate different machine code because there is more than one opcode valid for a operation. This is because there are opcodes that do the same but result in a shorter instruction. They made such opcodes for operations that are used often with the same parameters, for example eax register as parameter may be a different opcode. But there can still be a opcode that accepts eax as a parameter of the instruction.

As far as i know FASM tries to allways generate the smallest possible instruction.

wilbert · Post by **wilbert** » Mon Sep 03, 2012 3:55 pm

Thorium is right multiple opcodes can be valid.

Besides that, I don't understand why you would want to do it this way.
Assemblers like FASM or NASM are used by so many users. You can rely on that a simple instruction like ADD eax, 1 is outputted correctly even if the bytes that are used are slightly different on different assemblers.
I only used the byte values directly once because I needed the opcode PSHUFB and the version of the assembler PureBasic uses on OS X didn't support it.
In general it's much better to use assembler instructions compared to entering byte values.

jack · Post by **jack** » Mon Sep 03, 2012 3:56 pm

Thorium wrote:Dude, just look up the Intel manuals. Instruction encoding is well documented there. It isnt just opcodes, you also have prefixes and stuff. It's actually quite complicated.

you talking to me or oldefoxx?
I simply made the observation that one of the opcodes that oldefoxx posted seems invalid.

Thorium · Post by **Thorium** » Mon Sep 03, 2012 5:28 pm

jack wrote:
Thorium wrote:Dude, just look up the Intel manuals. Instruction encoding is well documented there. It isnt just opcodes, you also have prefixes and stuff. It's actually quite complicated.
you talking to me or oldefoxx?
I simply made the observation that one of the opcodes that oldefoxx posted seems invalid.

I mean oldefoxx.

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 7:04 pm

The hex opcodes you see between the layerer lines are what were actually returned by the assembler identified with it.
==============================================================================
Enter an 80x86 instruction mnemonic to convert: add eax,1
ml.exe says "add eax,1" translates to " 83 C0 01 "
nasm.exe says "add eax,1" translates to " 66 05 01 00 00 00 "
fasm.exe says "add eax,1" translates to " 66 83 C0 01 "
.
Enter an 80x86 instruction mnemonic to convert: add ax,1
ml.exe says "add ax,1" translates to " 66 83 C0 01 "
nasm.exe says "add ax,1" translates to " 05 01 00 "
fasm.exe says "add ax,1" translates to " 83 C0 01 ")
==============================================================================
What you see is a copy-and-paste from my running program. So that you can validate this for yourselves, I am going
to include the program so that you can try it out. Note that to see what each assembler does, you have to download
and install each assembler, with MASM32 going into a folder named \MASM32, NASM goiing into a folder named \NASM,
FASM going into a folder named \FASM. The program does not recognize other Assemblers because I did not take it
that far. I suppose TASM and GoASM might be valid additions. All the program does is collect what is between the
mov cl,cl and mov ch,ch instructions and display the bytes in hex back. If "66 05 01 00 00" is invalid, maybe the folks
with NASM might want to know.

MASM32 supports Assembler directives, and the .486 ensures that you are talking 32-bit segments I suppose. It looks
like NASM and FASM are going with 16-bit segments, If I have this down right from what has been said. So how do you
get NASM and FASM to go with 32-bit segments instead? Apparently neither one supports Assembler directives.

Oh, as to why this is important to me: I wrote this little utility in HotBasic, which has a number of little hidden flaws,
including the same difficulty in distinguishing when and where to use the prefix 66. It also does not support any of the
rep instructions, meaning no rep, repe, repne, repz, and repnz. But it does support adding numeric bytes into an
instruction stream with asm db. The idea is to find out what the hex opcodes are that could be inserted to make it do
more than it is designed to do. Now you are making it all seem to depend on segment identification, and yet at the
same time you are not suggesting any way to control the type of segment being used, or identify the one that you have.

Durn! I just realized that this is one of those forums that does not allow you to attach a file. Guess that means I cannot
include a copy of my little program. I could do the source code, but it is in HotBasic syntax, and I don't think many of you
would pay $85 or whatever it costs now to get HotBasic and learn enough of it to compile, maybe alter my program as a
proof of concept. Still, I told you enough about how I was approaching this that you could write your own if you wanted
to. I leave that matter to you for you to decide.

Why am I on a PureBasic forum when I am talking about HotBasic and Assembler? Because I am moving off of HotBasic and
onto PureBasic at this point, and I know better than to do pure Assembler. I mean what I want could be done that way,
but there is a lot more time invested in learning and development, and I like the ability to go high level coding where high
level works well enough, and low level assembler where it seems best suited. Besides, at 71, how many years have I got
left to be coding at all? Why waste what time I have left when I can evade some of the slow areas and go right on?

wilbert · Post by **wilbert** » Mon Sep 03, 2012 7:31 pm

oldefoxx wrote:The idea is to find out what the hex opcodes are that could be inserted to make it do
more than it is designed to do.

Do you mean you want to use undocumented cpu features ?

If not, what's the advantage of using hex codes instead of something like

Code: Select all

Procedure Increase(n.l)
  !mov eax, dword [p.v_n]
  !inc eax
  ProcedureReturn
EndProcedure

Debug Increase(5)

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 7:58 pm

Why would I want to do what you are doing? It's more code to write, and to remember so that
I can use it later.

I'm trying to make a parting shot at the HotBasic world, to help others that will run up against
the same limitations that I have found. In PowerBasic, they have a MID$() statement as well as
a MID$() function. That means you can write into the string at any given point. Don't have that
in HotBasic. Instead you have ReplaceSubStr$(), Replace$(), Delete$(), and Insert$(), and the
developer recommends using Delete$() and Insert$() instead of Replace$(), so it is really not as
efficient as just having a MID$() statement. So say you want to write an assembler routine to do
it for you, because HotBasic also does not have the equivalence of PEEK(), PEEK$(), POKE() or
POKE$() to work with. You do have @ or VARPTR() or a property of Pointer to work with with
strings, and with HotBasic's ASM you also have a property of loc which is the same as Pointer, but
if you are going to that extent, why not just do it in asm and be done with it? Except that the
HotBasic inline assembler capability is a bit limited, so you have to re-enforce it with Hex code.

The question then is, where do you find the necessary hex code? Apparently, the instruction guides
leave something out, which is what I am looking at now.

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 8:03 pm

Oh, another limitation of HotBasic is that for ReplaceSubStr$(), Replace$(), Delete$(), Insert$(),
each is a function, and the primary string inside the function is not impacted. The string that
is modified is sent to another string (or back to itself). That's a lot of extra copying going on
that is often unnecessary. I prefer to make my own copies of strings when I see fit, and for
me, these all should be subs, not functions. You lose precious time and eat up more memory
having functions that behave like this when subs could do the work in smaller space and faster.

Hmmm. I suppose that I should add that HotBasic's built-in Assembler capability does not use
the 66 hex prefix at all, meaning you either have all word instructions or all dword instructions.
It thinks it has both, because the developer thinks he included both. But after years of trying
to get the developer to fix some of the things that are wrong, and getting thrown out of both
the HotBasic groups at yahoo.com for my unrelenting efforts, I'm closing the chapter on HotBasic.

There is a lot to like about HotBasic, and I won't say there isn't. But it is certainly not perfect.

wilbert · Post by **wilbert** » Mon Sep 03, 2012 8:28 pm

oldefoxx wrote:Why would I want to do what you are doing? It's more code to write, and to remember so that
I can use it later.

I didn't understand correctly.
I assumed you were looking to combine asm with PureBasic and using inline asm is the easiest way.

Now I see your question doesn't have anything to do with PureBasic. You just want HotBasic to run asm code and are only looking for the hex codes you need.
Well, the instruction manuals and asm sites like http://ref.x86asm.net cover those values. But since the hex codes you need are combinations of prefix, mnemonic, registers and immediate values it might not be all very straightforward. That's why most of us don't use hex values directly.

oldefoxx · Post by **oldefoxx** » Mon Sep 03, 2012 8:51 pm

Not quite true, but a fair assumption. In working with HotBasic and its Inline Assembler, I
came onto the problem of 16 bit and 32 bit addressing were the same everywhere, and I
needed to know how to make the distinction. When the Assemblers disagreed, that carried
it to a new plane. Now I needed to know what assumptions the Assemblers were working
under that made them disagree, and since they disagreed, which was right?

It can help others understand when they may have to include or remove the 66 hex opcode
themselves, and it will help me in the future since my efforts are not limited to just bytes,
but to words or dwords as the case may be.

I did finally locate a page that seems to get into it to the point of stating that you either have
8 bits or 32 bits, unless you use a prefix, in which case the 32-bits becomes 16 bit. It does not
state that the prefix is always 66 hex, and it does not indicate that this rule varies by whether
you are using 16-bit segment addressing or 32-bit segment addressing. I can't force either
NASM or FASM to a different means of segment addressing, but I can drop the .486 directive to
MASM32 and see if that makes a difference or not. As to the page I located, here it is:

http://www.c-jump.com/CIS77/CPU/x86/lec ... code_sizes

PureBasic Forums - English

R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?

Re: R Hyde's Art of Assembly - best version?