::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. Dec 98/Jan 99 :::\_____\::::::::::. Issue 2 ::::::::::::::::::::::......................................................... A S S E M B L Y P R O G R A M M I N G J O U R N A L http://asmjournal.freeservers.com asmjournal@mailcity.com T A B L E O F C O N T E N T S ---------------------------------------------------------------------- Introduction...................................................mammon_ "Keygen Coding Competition".................................Ghiribizzo "How to Use A86 for Beginners".................................Linuxjr "Using the Gnu AS Assembler"...................................mammon_ "A Guide to NASM for TASM Coders"..................................Gij "Tips on saving bytes in ASM programs"...................Larry Hammick Column: Win32 Assembly Programming "A Simple Window".........................................Iczelion "Painting with Text"......................................Iczelion Column: The C Standard Library in Assembly "The _Xprintf functions"....................................Xbios2 Column: The Unix World "X-Windows in Assembly Language: Part I"...................mammon_ Column: Assembly Language Snippets "IsASCII?"............................................Troy Benoist "ENUM, CallTable"..........................................mammon_ Column: Issue Solution "PE Solution"...............................................Xbios2 ---------------------------------------------------------------------- +++++++++++++++++++++++Issue Challenge++++++++++++++++++++ Write the smallest possible PE program that outputs its command line ---------------------------------------------------------------------- ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::..............................................INTRODUCTION by mammon_ Wow! This issue is huge. More than twice the size of the last; maybe it is time to go monthly... This issue has as its theme --such as it were-- the use of popular free- and shareware assemblers. It began with my needing to write a GAS intro to accompany my X-Windows article; shortly thereafter, Linuxjr emailed me the benefits of his university training with his A86 tutorial (beginners: this is for you! Linuxjr explains *everything*). I then appealed to Gij to allow me to incorporate his Nasm 'Quick-Start' guiide, which I have used often...he posted the condition that I edit it heavily ;) I would like to draw your attention first to our new column: Assembly Language Snippets. Originally this was an idea which I and a few others had; however, I never received any contributions for the 'Snippets' section. Then I received an email from Troy with the first one... I pulled the rest out of my various asm sources and voila, a new column was born. This is something that is fully open to contributions; asm snippets --and we will need lots-- may be emailed to asmjournal@mailcity.com or mammon_@hotmail.com, or they may be posted to the Message Board at http://pluto.beseen.com/boardroom/q/19784/ Basic format should be: ;Name: Name to title you with ;Routine Title: Name to title the snippet with ;Summary: One-Line Description ;Comaptibility Specific Assemblers or OSes this works with ;Notes: Any extra notes you have --Code-- I should point out here that freeservers.com is not very reliable; thus the APJ home page is inaccessible more often than not. For this reason I have set up a mirror on my own page, at http://www.eccentrica.org/Mammon/APJ/index.html As for this issue's articles, we once again have two fine Win32 asm tutorials by Iczelion, who maintains an excellent page at http://iczelion.cjb.net (with a Win32 asm message board!). Ghirribizzo has supplied his fun Key Generator Competition results (I can't say I was surprised when I saw the winner ;). Larry Hammick --who also maintains an excellent, smoking-enabled page at http://www3.bc.sympatico.ca/hammick/-- has contributed a fantastic piece on asm optimization. XBios2 has this time gone above and beyond, not only with the C Language in Assembly but with his Issue Challenge as well... asm coders and reverse engineers alike should read this. As for the issue challenge, XBios2 did not provide me with one for next issue, so I used one from a text I found on the Internet somewhere... he has been emailed the text and can try to beat it ;) Also, I am going to be setting up a page for reader responses to the Issue Challenges -- readers can anticipate the solutions before each issue comes out, or try and best the solution afterwards. Submissions can be sent to the same places as the Snippets. Author Bio's? I know mainstream mags do this-- if you want one, send one. I'll tack it onto the end of the article ... anything within reason: URL, email, hobbies, perversions, favorite drink, favorite linux distro, etc. Next Issue: How many articles on Code Optimization can I get? That would make a great theme (with the foundation laid this issue)--anything from code theory to PentiumII-specific optimizations would be welcome. Prospective articles, send to me or post on the MB...no topic is unacceptable unless you can in no way possible relate it to assembly language. Enjoy the ish, _m ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Keygen Coding Competition by Ghiribizzo Introduction ------------ The competition was to write the smallest key generator for the simple serial scheme I wrote as a trainer for newbies. I had a few reasons for starting this competition: · To give the newbies a chance to participate in a competition · To give old hands the chance to brush up on their assembly skills · To promote tight assembly coding · To demonstrate the various different methods used to improve efficiency in coding Well, I'm back from my short European jaunt and the competition is now closed. I have greatly enjoyed the entire competition, from the coding of the crackme and the chats with various crackers on IRC through to deciding the winner and writing this document. Analysis of the Serial Scheme ----------------------------- The serial scheme was kept deliberately simple as it was written for newbies to train with. The scheme took a name of up to 16 bytes long and required a 16 byte serial number. There was a 256 byte lookup table that was indexed directly with the ASCII values of the name field. The name was padded to a length of 16 (if necessary) using values hardcoded into the scheme. The 256 byte lookup table was created using eight maximal 8 bit linear feedback shift registers (LFSRs) in parallel i.e. producing one output byte per 'clock'. The LFSRs were initialised to produce 'Ghir_OCU' as the first 8 output bytes. The table was precomputed and it was not expected that the cracker recognise the nature of the lookup table - although a post I made to the cracking forum about LFSRs might have tipped the more astute crackers! The rules of the competition required that some standard interface text be included which strongly urged the use of service 9 interrupt 21h - though this would probably be used in any case - and discouraged blank screens and other unfriendly UIs from being used to save bytes. Also, the rules specified a range of input to be handled smaller than the possible 256 maximum. Due to the simple nature of the serial scheme, this meant that the lookup table could immediately be stripped down to the input range. I envisioned that there would be 3 ‘fights’. One to reduce the original algorithm, second to reduce the packed table lookup algorithm and the last to reduce the LFSR algorithm. As it turned it, everyone seemed to go for the packed table option. The Entrants ------------ The following entrants have been included because they illustrate the different ideas and methods used to reach the common goal of reducing code size. I didn't realise that so many crackers would use the precomputed table method. Perhaps word got out during IRC chats and everybody started using them? In any case, this didn’t reduce the size cutting war as precomputation had its own routines that needed to be optimised. Ghiribizzo Alpha (223 bytes) ---------------------------- This was not an entrant as it would hardly be fair for me to enter the competition knowing how the lookup table was generated! This keygen was basically converted from the crackme and improved 'on-the-fly' by generating the lookup table in code and tidying up routines where they were obviously inefficient. No great thought went into this and the code size was just to give myself an idea of what crackers would be aiming for. Aside from generating the lookup table, the only other unusual feature of this keygen was the use of the XLAT command instead of the standard indexing used in the crackme. I didn't stop to check whether this used less space or not, but included it as newbies may not be familiar with the XLAT instruction. As it happened, the XLAT instruction was used in Spyder’s keygen. From the size I got from this keygen, I tried to guess a required key input range to put this size between thestraight table precomputation and the packed table precomputation. One thing to note is how I ended the program. I was quite surprised by the fact that nobody else seemed to know that you could quit com programs with a ret instruction. Further size savings can be made by using Bb’s trick of keeping DH and also by tweaking the generator to fix some of the bitstreams produced to give us the bits we need and save later processing. Cruehead Alpha (244 bytes) -------------------------- I got this from Cruehead on IRC when I asked to see what he had managed so far. Although this version is unfinished it is still impressive. The keygen relies on precomputing the whole table and reducing the keygen to a single table lookup. The coding is very simple - almost seems as if Cruehead was typing the steps going through his head straight onto the keyboard (perhaps he was?) the resulting code is consequently very easy to understand and follow. Bb #10 (230 bytes) ------------------ Bb has written an excellent keygen. He has put some serious hard work into this including taking the time to calculate the dx offsets manually instead of just using the ‘offset’ feature that the compiler provides. It has been fun watching Bb’s keygen progress as the first one I received was version 5 which was 256 bytes long. The keygen presented here is version 10. There are other nice bits and bobs throughout this code. This makes it quite frustrating as in various places so much space is blatantly wasted. Just take a look at the last 6 lines of code! There shouldn’t even be 6 lines there! I’m sure Bb will learn a lot from seeing some of the other keygens here and I’m sure he will do very well should he enter the next competition. Spyder (211 bytes) ------------------ Tidy, compact and elegantly coded. A little sparse in commenting (it seems like Spyder coerced IDA to write the keygen for him ;-p). The table lookup is an interesting piece of code. VoidLord (247 bytes) -------------------- Another keygen using the idea of a packed precomputed table. VoidLord’s first keygen. Let’s hope we see more! Honourable Mentions ------------------- Special mention given to Trykka who managed to deduce how the look-up table was created - but never sent in an entry! The Winner ---------- Well it looks like Spyder is the winner by quite a large margin. Incidentally, I have just made a quick check that the keygens work. You might be able to bump yourself up on the scale by picking holes in the other keygens :-) Rankings -------- __Keygen______Size________Author______ kgen.com 211 Spyder kg.com 224 Ghiribizzo (alpha) kg10.com 230 Bb kg9.com 233 Bb kg6.com 239 Bb kgvoid.com 247 VoidLord kgcrue.com 255 Cruehead (alpha) kg5.com 256 Bb kgt.com 529 Serial Scheme Final Words ----------- There have been some excellent ideas in the keygens. However, none of the keygens are as small as they could be. They all have some scope for improvement. By combining some of the ideas given in the above keygens, we could create a new smaller keygen. It will be interesting to see what the smallest possible keygen would look like. I hope that everyone who has taken part in the competition, or who has followed it, has gained something from it. I hope that there will be more entries for the next competition! The Source Codes ---------------- ; Ghiribizzo’s Keygen ========================================================= .model tiny .386 .code .startup ; The first part of the code is the table generator ; Note that we can actually do some ‘precomputing’ by ; fixing some of the bits in the generator to produce ; the bits that we need. This will save some bytes ; in the serial section. I have not bothered to do this. mov ax, 5547h mov bx, 6869h mov cx, 725fh mov dx, 4f43h mov di, offset PRD mov si, offset PRD + 0ffh LFSR: stosb ;Save MSB mov bp,ax mov al,ah and ax,0ffh xchg ax,bp ;Tap xor ah,bl xor ah,ch xor ah,al ;Shift mov al,bh mov bh,bl mov bl,ch mov ch,cl mov cl,dh mov dh,dl ;Store MSB and dx,0ff00h or dx,bp cmp di,si jle LFSR ;----------------------------------------------------------------- mov ah,9 mov dx,offset startMsg int 21h mov ah,10 mov dx,offset NameInput int 21h ;----------------------------------------------------------------- mov si,offset NameBuffer mov di,offset NameHash mov bx,offset Table1 MakeSerial: lodsb xlat and al,3fh or al,30h byteOK: cmp al,39h jle keepit add al,7 keepit: stosb cmp di,offset stopbyte jl MakeSerial ;----------------------------------------------------------------- mov dx,offset NH2 printMsg: mov ah,9 int 21h exit: ret StartMsg db 0dh,0ah,'OCU Keggen #1 ',0feh,' Ghiribizzo 1998 ',0dh,0ah db 0dh,0ah,'Enter Name : $' NameInput db 17 NameRead db ? NameBuffer db 'mk3 "![]ns)%3x#0Z' nh2 db 0dh,0ah,'Serial Number: ' NameHash db 16 dup('y') stopbyte db 0dh,0ah,'$' Table1: PRD: END ; Cruehead’s Keygen =========================================================== .model tiny .386 .stack .data StartMsg db 0dh,0ah,'OCU Keggen #1 ',0feh,' Cruehead 1998 ',0dh,0ah db 0dh,0ah,'Enter Name : $' SerialMsg db 0dh,0ah,'Serial Number: ' NameVar db 011h,0h,06Bh,06bh,033h,020h,022h,021h,05bh,05dh,06eh db 073h,029h,025h,033h,078h,023h,030h,'$' Table db 037h,035h,034h,031h,036h,032h,046h,044h,046h,044h,044h db 031h,035h,035h,038h,035h,036h,046h,032h,045h,036h,030h db 031h,039h,033h,034h,030h,046h,031h,042h,044h,030h,043h db 036h,043h,035h,039h,045h,039h,033h,036h,043h,037h,035h db 036h,044h,045h,036h,032h,044h,031h,037h,039h,030h,031h db 042h,046h,043h,034h,032h,031h,035h,037h,034h,044h,032h db 032h,032h,030h,043h,034h,030h,044h,044h,033h,039h,044h db 043h,038h,036h,031h,038h,041h,037h,034h,046h,045h,041h db 036h,044h,043h,041h,041h,039h,043h,037h .code .startup mov ah,09h lea dx,StartMsg int 21h mov ah,0ah lea dx,NameVar int 21h OnceAgain: mov bl,NameVar[di+4] cmp bl,0dh jne noprob mov bl,02bh noprob: mov al,table[bx-020h] mov NameVar[di+2],al inc di cmp di,0Eh jne OnceAgain mov word ptr NameVar[16],00a0dh mov ah,09h lea dx,SerialMsg int 21h .exit end ; Bb’s Keygen ================================================================= ; KG10 - Ghiribizzo KeyGen ; written by bb 12Sep98 1:30AM ; next revision 13Sep98 5:00PM ; yet more changes - 26Sep98 - late late night ; eat 3 more bytes 28Sep98 ; ; comments where the evils lay ; ; I just knew that I HAD to make this thing 256 bytes of less. Beware: This ; is NOT an example of good coding practice! I almost wish I could do a ; "bytes saved" comparison for all the little hacks. ; ; I've gotten this to assemble under TASM. It MUST assemble as a 16-bit COM file, ; and even then I can't guarantee that the offsets will remain stable between ; various assemblers. Let me restate that: I CAN guarantee that this won't ; work for you when you try and assemble it yourself. :) ; P8086 MODEL TINY DATASEG OffsetStartMsg EQU 52h OffsetMySerial EQU 7fh OffsetSerialMsg EQU 91h OffsetMyName EQU 0a3h StartMsg db 0dh,0ah,'OCU Keggen #1 ',0feh,' ----- bb ----- 1998',0dh,0ah ; There's no reason not to re-use this section of the StartMsg, since it fits ; perfectly though code had to be added to affix a linefeed MySerial db 0dh,0ah,'Enter Name : $' SerialMsg db 0dh,0ah,'Serial Number: $' ; previous change to MyName not needed anymore MyName db 11h, 0h, 6Dh, 6Bh, 33h, 20h, 22h, 21h, 5Bh, 5Dh, 6Eh, 73h, 29h, db 25h, 33h, 78h, 23h, 30h, 5Ah ; Not only does the full table not need to be used, but since it's basically a ; substitution cypher we can fit everything into these 96 or so bytes ; Also, the trailing commented-out 37h saves us one byte. It's the substitution for 7Fh, ; but since 7Fh is a DELETE when using 0a/int21h, it never gets accepted by KGT.COM or by ; this keygen. Therefore, it's useless and unneeded. ; NewTable db '754162FDFDD155856F2E6019340F1BD0C6C59E936C756DE62D17901BFC421574 ; D2220C40DD39DC8618A74FEA6DCAA9C';, 37h ; and I missed the fact that it also only uses characters 0-9 and A-F ; which can be expressed in 4 bits, cutting the 96 byte table in half NewTable db 75h, 41h, 62h, 0FDh, 0FDh, 0D1h, 55h, 85h db 6Fh, 2Eh, 60h, 19h, 34h, 0Fh, 1Bh, 0D0h db 0C6h, 0C5h, 9Eh, 93h, 6Ch, 75h, 6Dh, 0E6h db 2Dh, 17h, 90h, 1Bh, 0FCh, 42h, 15h, 74h db 0D2h, 22h, 0Ch, 40h, 0DDh, 39h, 0DCh, 86h db 18h, 0A7h, 4Fh, 0EAh, 6Dh, 0CAh, 0A9h, 0C7h CODESEG STARTUPCODE ; A note here: We're at <256 bytes and we fit snugly between 0100h-0200h in memory. ; Therefore, any offset to text that we need is going to have a constant value for ; DH, namely 01h. By initializing DH once at this next line of code, we never need ; to change DH again, only DL. We'll save a few bytes here and there because of it, ; though it's more work to find the offsets manually after assembly, and then hard- ;coding them in and re-assembling. I suppose there might be some construct like ; offset ( MyName AND 00ffh ), but I didn't really look into it. EQU will work. mov dx, offset StartMsg mov ah,09h int 21h ; save a byte mov dl, OffsetMyName mov ah,0ah ; Now that we're through with the StartMsg, we can adjust MySerial to print a linefeed. ; I can save a byte here by using the AH register instead of a 0AH immediate value, ; since AH is now set to 0AH for the int21 get-string-from-keyboard. mov [MySerial+10h], ah int 21h ; 2 into DL for a division during the main loop mov dl, 2 ; We start at the END of MyName and work our way backwards, because we can avoid the CMP ; and simply check for the Signed flag when BP rolls over. We save a couple of bytes. mov bp, 0fh ; Also, I shaved a few bytes out of this by using BP in place of BX, avoiding the PUSH/POPs ; which I shouldn't have done anyway since I didn't define a new stack for the application. loop1: xor ah, ah ; need to clear ah and bh, unfortunately. xor bh, bh mov al, [bp+MyName+2] sub al,20h ; if the sub sets carry, then we're probably the carriage return jnc skipcr ; so we'll set ourselves = to something that has the same table mov al, 03h ; lookup value as the carriage return. skipcr: div dl ;after the DIV, AL will be two table values, and AH will decide which ; one we should use mov bl, al ; we need table lookup through bx, not al mov al,[bx+NewTable] test ah, dh ;since dh always=1,test ah,dh will save us a byte over test ah,01 jne skipshift ; if AH=0, use least significant nibble ; if AH=1, use most significant nibble by shifting MSN into LSN ; TASM assembles shr al, 4 as shr al, 1 four times.. we don't want that. db 0c0h, 0e8h, 4 ; shr al, 4 skipshift: and al, 0Fh ; strip off high nibble add al, 30h ; and turn into printable [0-9A-F] character cmp al, 39h jle numnum add al, 7 numnum: mov [MySerial+bp],al dec bp jns loop1 ; loop until bp flips ; save another "offset" byte mov dl, OffsetSerialMsg mov ah, 9 int 21h ; save another byte mov dl, OffsetMySerial ; AH should already == 9, no need to specify it here. int 21h ; End of the line mov ah,4ch int 21h END ; Spyder’s Keygen============================================================== ; Ghiribizzo's Key Generator Competition entry by Spyder ; Sheesh you get assembler source and you want comments? ; Only one nibble of each byte in the original key table holds useful ; information. Only key table entries in the range 20..0x7F and 0x0D are ; needed - those 60 nibbles are packed into a 30 byte table, 0x0D is handled ; as a special case. ; The rest is just space concious assembler with a few wrinkles to save ; bytes. I worry I may have missed some pattern in the key table, could it ; be generated or derived? Otherwise I'm pretty happy with the result. .286 seg000 segment byte public 'CODE' assume cs:seg000 org 100h assume es:nothing, ss:nothing, ds:seg000 public start start proc near mov ah, 9 mov dx, offset StartMsg int 21h ; Sign on mov ah, 0Ah mov dx, offset Buffer int 21h ; Get name mov si, offset BufferCont ; Set up for loop mov di, offset Serial mov bx, offset Key - 10h xor ax,ax mov cx,10h loop1: lodsb ; cmp al,0dh ; don't need this because we arranged the data ; jnz skip0 ; before the key table to give the right code ; mov al,'p' ; for this out of range case skip0: sar al,1 xlat jc skip1 sar al,4 skip1: and al,0fh add al,'0' cmp al,'9' jle skip2 add al,7 skip2: stosb loop loop1 movsw movsb mov ah,9 mov dx,offset SerialMsg int 21h int 20h start endp Buffer db 11h ; db 0 ; BufferCont db 'm' db 'k' db '3' db ' ' db '"' db '!' db '[' db ']' db 'n' db 's' db ')' db '%' db '3' db 'x' db '#' db '0' db 0dh, 0ah, '$' StartMsg db 0dh,0ah,'OCU Keggen #1 ',0feh,' ----- spyder ----- 1998',0dh,0ah db 0dh,0ah,'Enter Name : $' db 0 ; A crucial spacer Key db 075h, 041h, 062h, 0FDh, 0FDh, 0D1h, 055h, 085h db 06Fh, 02Eh, 060h, 019h, 034h, 00Fh, 01Bh, 0D0h db 0C6h, 0C5h, 09Eh, 093h, 06Ch, 075h, 06Dh, 0E6h db 02Dh, 017h, 090h, 01Bh, 0FCh, 042h, 015h, 074h db 0D2h, 022h, 00Ch, 040h, 0DDh, 039h, 0DCh, 086h db 018h, 0A7h, 04Fh, 0EAh, 06Dh, 0CAh, 0A9h, 0C7h SerialMsg db 0dh,0ah,'Serial Number: ' Serial: seg000 ends end start ; VoidLord’s Keygen============================================================ ; OCU Keygen #1 | VoidLord 1998 ; Category: newbie (this is my first keygen) ; Solution: ; for the every possible input char (20h-7fh) the "serial" char is stored in the ; Table. Since the output chars can only be 0-9 and A-F, we can store two chars ; in one byte, reducing the table size to 48 bytes. seg000 segment byte public 'CODE' assume cs:seg000 org 100h assume es:nothing, ss:nothing, ds:seg000 start proc near mov ah, 9 ; DOS - Write starting message lea dx, StartMsg int 21h mov ah, 0ah ; DOS - read Name lea dx, Serial int 21h xor ax, ax xor bx, bx loop1: mov al, [Serial2+bx] ; the output will be in the same buffer cmp al, 0dh ; end of input string (odh) ? jne no_cr mov [Serial2+bx], '1' ; the output char will be '1' jmp finish ; the remaining chars are OK already no_cr: push bx ; now we should translate the namechar sub al, 20h ; to the serial number char, using the mov bx,ax ; lookup Table shr bx,1 ; we have two chars in one byte in the Table and al, 1 jnz odd ; is this char "even or odd" ? mov al, [Table1+bx] and al, 0fh ; if even, use the lower 4 bits jmp end_l odd: mov al, [Table1+bx] mov cl, 4 shr al, cl ; if odd, use the higher 4 bits end_l: pop bx ; translate the number to the hex char cmp al, 10 ; is it digit 0-9 or letter A-F jl digit add al, 7 ; if letter, add 7 digit: add al, '0' ; if digit, just add '0' mov [Serial2+bx], al inc bx ; process next input char cmp bx, 10h jl loop1 finish: mov Serial, ':' ; complete the output string mov Serial+1,' ' mov ah, 9 ; DOS - Print solution lea dx, SerialMsg int 21h mov ah, 4Ch ; DOS - QUIT with EXIT int 21h start endp StartMsg db 0dh,0ah,'OCU Keygen #1 ',0feh db ' ----- VoidLord ----- 1998' db 0dh, 0ah, 0dh,0ah,'Enter Name : $' SerialMsg db 0dh,0ah,'Serial Number' Serial db 11h, 0 Serial2 db 67, 57, 69, 55, 52, 53, 50, 53, 56, 55 db 68, 50, 69, 54, 49, 54, 0dh, 0ah, '$' Table1 db 87, 20, 38, 223, 223, 29, 85, 88, 246, 226 db 6, 145, 67, 240, 177, 13, 108, 92, 233, 57 db 198, 87, 214, 110, 210, 113, 9, 177, 207, 36 db 81, 71, 45, 34, 192, 4, 221, 147, 205, 104 db 129, 122, 244, 174, 214, 172, 154, 124 seg000 ends end start ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE How to Use A86 for Beginners by Linuxjr Requirements: -Basic Dos knowledge like copying and renaming files and such I am writing this paper for I find plenty of tutorials and books all about assembly and how to write programs and how to do loops, if/else statments, etc... But one thing I did not see plenty of is tutorials on how to set up the assembler of choice that you grow fond of, for instance nasm, a86, tasm, masm GAS, etc. So I am writing about a86 and I'm using my college notes and experience I learned from my Assembly class. I hope this will help you enjoy a86 and encourage you to learn how to manage up to x286 opcodes and 16-bit code before you start tackling with 32bit and Windows programming in assembler. This is a sort of warning that you will only be able to write DOS programs but you have to learn how to crawl before you can walk, and you have to learn how to walk before you can run. I hope to show you how to set up a86, how to write a few simple programs with the template I use, and how to do some basic stuff in assembler. I took a college course on Assembly a couple of months ago, and I was happy to learn the internals of the system and how to manipulate the registers for some awesome results. The assembler that we used was a86 by Eric Isaacson. This is a shareware program, meaning you get to play with it before buying it. To get this assembler go to - http://www.eji.com/a86/ - and you will see where to download the programs. It is in a zip file and you just unzip it with your favorite program like winzip or pkunzip. You should also download d86, the debugger, for use with your a86 programs. Once you downloaded them, unzip the files to a directory such as c:\a86, or even put on a floppy disk if you are worried about space. Getting Started --------------- Let's get into it: you've got the assembler and the debugger, what next? First of all, we have to make a text file since all asm source code is nothing but a plain text code that has a bunch of operands and functions to do what you want your program to do. I start all my a86 programming by opening up my template.asm, which what I got from school; it is a useful template and it makes a good dos .EXE when you compile it with the supplied batch file. Cut the following code and save it in a text file called template.asm: X--------Begin Cutting here-------------------------------------------- ; PROGRAM : ; ; AUTHOR : ; ; PURPOSE : ; ; PROGRAM OUTLINE : ; ;============================== EXTERN =============================== ;=========================== STACK SEGMENT =========================== sseg segment para stack 'STACK' db 100H dup ( ? ) ; allow 256 bytes of memory for ; use by our program stack. sseg ends ;============================ DATA SEGMENT ============================ dseg segment para 'DATA' dseg ends ;============================ CODE SEGMENT ============================ cseg segment para 'CODE' ; Begin the Code segment containing executable machine code program proc far ; Actual program code is completely ; contained in the FAR procedure ; named PROGRAM assume cs:cseg, ds:dseg, ss:sseg ; Set Data Segment Register to point to the Data Segment of this program mov ax,dseg mov ds,ax ;=============== Rest of MAIN PROGRAM code goes here ================== exit: mov ax,4c00h ; terminate program execution and int 21h ; transfer control to DOS program endp ; end of the procedure program ;============================ PROCEDURES ============================== cseg ends ; End of the code segment containing ; executable program. end program ; The final End statement signals the ; end of this program source file, and ; gives the starting address of the ; executable program X--------Stop Cutting here-------------------------------------------- Now we have a template to use, and this is just one out of many templates you can make for your assembly programs. Now let's begin to have fun. These few programs will get us going for a basic feel of how to set up a basic hello program. What we will learn from this example is: 1) The basic mechanics of editing the template file to get an ASM source code file, assembling and linking it, and possibly fixing syntax errors. 2) Nearly all of the programs have loops in them, having different formats. 3) The operation of several INT 21H functions: 01H, 02H and 08H (character input and output), 09H(string output), and 4CH(program termination) 4) The operation of the DOSIOLT procedures: inhex16 and outhex16, and how to assemble and link a program that uses them. 5) Both string and numeric variables will be demonstrated. Creating an ASM file for the Message Program -------------------------------------------- To become familiar with the process of creating an assembly program, you will create a simple program that prints a one line message. As with most programming languages, Assembly programming starts with a plain text file containing the program instructions to execute. Ordinarily, a programmer would have to type in the entire source file from scratch. But 8086 assembler program files contain a large number of setup directives and declarations which are essentially the same for every program. It will be easier to start with a file that has all the necessary directives and declarations already in it, and just add to it the actual program parts. The file template.asm is that a template which contains all the necessary pieces of a program, except for the actual program itself. Make a copy of the template.asm file, and name it something appropriate: message.asm is a good choice. The file extension must be ASM. You will edit the new file to create your first program. DO NOT EDIT template.asm itself!!!! You will use this template file as the start of your assembly programs so it should not be alterated(until you get advanced enough to play around with it ;-). We will be using EDIT in a dos box as our editor, though you can use notepad or Ultraedit to edit your assembly files as well. The Comments All of the progras that you will write should have a descriptive set of header comments at the top. Any text AFTER a semicolon is considered a comment. The top of your new program file should already have the basic outline for this comment. Edit in your message.asm file to have something like this : ; PROGRAM : Message Program ; ; AUTHOR : Your Name here ; ; PURPOSE : This program simply prints a one line message ; to the screen ; PROGRAM OUTLINE : Use INT 21H Function 09H to print the message. This is just an example to help you know what you want to do, and to have a reference if you were to walk away from a project for a year or so...the header will make a nice reminder of what you were trying to get this program to do. The Ram Variable The program that you will creat in this part requires a variable. You will create a string of characters labeled message. The part of the file where all data is placed is the Data Segment. Look in your ASM file for the following lines: dseg segment para 'DATA' dseg ends Change this part of the code so that the message to be printed is defined. The result will look like: dseg segment para 'DATA' message db 0DH, 0AH, "WHOPPEEEE!!! My first Message.", 0DH, 0AH, "$" dseg ends The HEX values are the two-byte sequence for a DOS newline(CR-LF). The first characters of "0DH" and "0AH" is ZERO, no capital O. Note that there is NO semicolon before "message". Do not allow this part to break over two lines. THE Code - Now locate the part of the code where the program code goes. It should look like this: ;========================Main Program================================ ; program proc far ; Actual program code is completely ; contained in the FAR procedure ; named PROGRAM assume cs:cseg, ds:dseg, ss:sseg ; Set Data Segment Register to point to the Data Segment of this program mov ax,dseg mov ds,ax ;=============== Rest of MAIN PROGRAM code goes here ================== exit: mov ax,4c00h ; terminate program execution and int 21h ; transfer control to DOS program endp ; end of the procedure program ;============================ PROCEDURES ============================== cseg ends ; End of the code segment containing ; executable program. end program ; The final End statement signals the ; end of this program source file, and ; gives the starting address of the ; executable program All of the code for your program Should REPLACE the comment: "Rest of Main Program code goes here". Here is the code you will use to print out the message: ;Print the message mov dx, offset message mov ah, 09H int 21H This code just calls the DOS Interrupt used to print strings to the screen. Interrupt 21H is a general starting point for many useful DOS calls. The sub-function used to print strings is Function 09H; this value must be loaded into the AH register before calling. Also, Int 21H Function 09H requires the address of the message be placed in the DX register. The above code performs these two initialization tasks, and then calls the interrupt. Take careful note of the semicolons which start the comments. Also, do not alter any of the other part of the code. These were the only two changes you needed to make. Assembling with asm86.bat ------------------------- Now we have written our first asm file. To assemble with a86 you could try to use the switches from the manual that is included with the a86 package, or you can make things easy by using this batch file, which is designed for programs that use the template file. Here is the batch file: :------------------------------ASM86----------------------------------- @echo off REM This is a simple batch file to use a86 and link: if exist %1.asm GOTO FOUND echo %0 ERROR : %1.asm -- FILE NOT FOUND echo Usage: %0 file [link-file] GOTO STOP : FOUND :-- Assemble the program echo a86 +O +S +E %1.asm a86 +O +S +E %1.asm ::-- IF THERE WAS AN ERROR, STOP IF ERRORLEVEL 1 GOTO STOP ::-- IF there is a second file name, assume it is a OBJ file, ::-- and link it to the %1 name. IF X%2 == X GOTO ELSELINK ECHO link %1+%2; link %1+%2; GOTO ENDIFLINK :ELSELINK echo link %1; link %1; :ENDIFLINK :STOP and save this as asm86.bat. All this does is 1) create an object file (+O), 2) suppress the creation of the symbol table .sym(+S), and 3) copy the errors to a the filename.err instead of writing in your file(+E). To assemble the message.asm with the batch file, type asm86 message If there were any errors, you will have to edit the asm file to fix them. The error messages displayed by the assembler should indicate the line number and cause of the problem. Since you are just copying pregenerated code, any errors will simply be typos. Once all of the errors have been corrected, a pair of files will have been created. The will have the same base name as the original asm file, but will have different extensions: OBJ- Object file. Contains the basic machine code, but does not have any references to external procedures. This is, effectively, an intermediate file which is used by the linker to produce the final executable file. EXE- Executable file. All external references resolved. Completely runable. To run the program just type Message and you will see the line appear on the screen. This was a simple Hello program. What you probably want is another example or two to try out, and that is what we shall do. The next Program that won't be as long but will have plenty of info. CharLoop Program ---------------- In this part, you will create a simple program that asks the user to enter a character, and prints it out again. It does this repeatedly, until the user hits the ESC key. Dos funtions 01H and 02H are introduced with this program, and it is the first program containing a comparison loop. Again you should start by copying template.asm to a file called charloop.asm. Edit the charloop.asm template so that it has the following changes: Create two messages by adding the following lines to the Data Segment part of the program (see the message program instructions, if you don't remember how to do this): prompt db 0DH, 0AH,"Enter a Character: $" outmsg db 0DH, 0AH, "You Entered: $" Now add the code which will put the following "pseudocode" into effect: Repeat prompt for and read a character Print the character back out with a message While the character read is not esc Which will turn out to be the following assembly code: char_loop: ;Print the prompt mov dx, offset prompt mov ah, 09H int 21H ;Read a character into AL mov ah, 01H ;(01H - with echo; 08H - no echo) int 21H mov bl, al ;save character in BL ;Print the final message mov dx, offset outmsg mov ah, 09H int 21H ;Write the character to the screen mov dl, bl ;put character in dl mov ah, 02H int 21H ;Loop back, only if the character was not esc (1BH) cmp bl, 1BH jne char_loop ;End Repeat Note how the two new DOS interrupts are called. The Function number is always placed in AH before calling, and the INT 21H instruction is used to invoke the interrupt. For Function 1H, which reads a character to the screen, the DL register must be initialized with the appropiate value. Note also that the character must be stored somewhere throughout the whole loop, and it can NOT be stored either AL or DL -- AL is modified by Function 02H, and DL is modified when DX is set to the address of teh strings. So BL is used to store the character, and the value must be transferred between AL, BL and DL during processing. This kind of juggling happens often in assembly programming. Get this program running to watch another good program going ;-). CharLoop Program without Echo ----------------------------- In CharLoop program above. Function 01H was used to read a character from the keyboard. It does more than just read a character, it also echoes it back to the screen. This way, when you type something, you get visual feedback of what you have done. Function 08H works exactly the same as Function 01H, except for this echo feature: Function 08H does NOT echo the character after reading it. Create a new program which is exactly the same as CHARLOOP, except it should use Function 08H to read the characters, instead of Function 01H. Write and run the program to see how it works. NumLoop Program --------------- This program will work in a similar fashion to the Charloop program above, but it will read and print numbers. Since there is no DOS interrupt to convert ASCII characters to numbers, your code will have to do this. Fortunately, there are already procedures to do this. A few extra steps must be taken to use them, but it will be much easier than writing the code from scratch. See the info about DOSIOLT for details on how to use thes procedures. DOSIOLT Procedures Here is a description of the DOSIOLT procedures: inhex16 This procedure reads a HEX number in character format from the standard input, and converts it to a word. Spaces or Tabs may precede or follow then number. DOS int 21H-0AH is used to read the input string, so it must be terminated by a RETURN. Both upper and lower case letters A-F may be used. If the number typed is larger than FFFH, the upper bits are lost. If anything unpredictable is typed(like non-HEX chars) the function will return junk. Inputs: None Outputs: AX- the word-sized number read. Modifies: AX, flags outhex16 This simple routine prints the four 'nibbles' of AX as ASCII digits. Four digits are always printed. Input: AX- the number to be printed Outputs: None Modifies: Flags outHex8 This simple routine prints the two 'nibbles' of AL as ASCII digits. Two digits are always Printed. Input: AL-the number to be printed OUTPUT: NoneModifies: Flags Call Each of these procedures is invoked with the CALL instruction. Any inputs(registers) must be initialized before the call; any outputs(also registers) are set by the procedure, and contain the appropriate value after the call. For example, to print the 1-byte value "2F" to the screen: mov al, 2FH call outhex8 ;Prints: 2F To Print "2AC5" mov ax, 2AC5H call outhex16 ; print 2AC5 To read a number from the keyboard: call inhex16 ;The ax register now contains the number read Extern Since the code for these functions does NOT appear in your ASM file, two special steps must be taken in creating your executable file. The first is to declare the names of the procedures as external procedures. This informs the assembler that the code has been written elsewhere, and you didn't just forget to write it. The extern declaration should come someplace early in the ASM file. Although it doesn't matter greatly where it goes, most programmers will put these declarations outside of all of the segments. The template file given has a spot for externals, marked with a commment. The format for the declaration(in this case) is: extern procedure_name:far A86 USERS: The A86 Assembler uses the older version of the extern declaration, which is spelled extrn. If you are using the a86 assembler(asm86.bat), make sure you spell the name of the instruction extrn. procedure_name is the name of the procedure that you will use in the program. The name only needs to be declared once in this way, no matter how many times it is used. But if two or more DOSIOLT procedures are to be used, each must have a separate declaration. You should NOT place these extern declarations in your code unless you are actually using the routines. The linker may place the code for the procedure in your final executable even if it is never called. LINKING A special step must be taken in linking (the second half of the compilation phase done by asm86.bat) to link the code in DOSIOLT. Fortunately, asm86.bat can handle the extra file fairly automatically. Just include the DOSIOLT on the command line, after your asm file name. Example: assuming you have written a program in a file called "calc.asm" which contains calls to the DOSIOLT procedures. To assemble and link the program: A:\> asm86 calc dosiolt If you get an "Undefined Symbol" error, it is because you mistyped, or forgot, the extern declarations for the DOSIOLT procedures. Make sure these are correct. If you get an "Unresolved External" error, it is because you forgot to put "DOSIOLT" as the second file name; i.e. you typed: asm86 calc instead of asm86 calc dosiolt. This program will illustrate the use of two of the DOSIOLT procedures, and also the use of variables, rather than registers, as places to store information. The outline of the program is as follows: Loop forever Prompt for, and read a number into the variable NUMBER IF number = 0, then break out of the loop print Number, with an appropriate announcement. End Loop Your program will need a prompt string, a response string and a word-sized variable in the Data Segment: prompt1 db 0DH, 0AH, "Enter a number: $" response db 0DH, 0AH, "You Entered: $" number dw ? Number has been declared as a word-sized variable, with no initial value. The Code can now use the name "Number" just like a register name( in most cases). The code for the program is: number_loop: ;Print the first prompt mov dx, offset prompt1 mov ah, 09H int 21H ;Read a number into AX and put it in NUMBER call inhex16 mov number, ax ;If number = 0 the exit the loop cmp number, 0H je end_number_loop ;Print The second prompt. mov dx, offset response mov ah, 09H int 21H ;Print the number mov ax, number call outhex16 jmp number_loop end_number_loop: Note that the inhex16 reads a number into AX, and outhex16 prints the number AX, yet this code went through all the trouble of storing the number in the variable, rather than just leaving it in AX throughout the loop. WHY?!? Because AX was needed in between the reading and printing of the code. Again, this kind of juggling between registers and variables occurs often in assembly programming. Since two DOSIOLT procedures are being used, they must be declared. At the top you will find the EXTERN part of your program template; add these lines to the section: ;===============================Extern====================================== extrn inhex16:far extrn outhex16:far Those are all the changes needed. Don't forget to include the DOSIOLT file on the command line when compiling, which will be --- asm86 numloop dosiolt I do apologize for the length of this but I got to excited when I was messing with these old files and playing with these procedures in dosiolt.obj file. If you want to try to use these files, you can email me at linuxjr@hotmail.com and request the dosiolt.obj to use with the numloop; I will be more than happy to send it. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Using the Gnu AS Assembler by mammon_ Using the Gnu AS Assembler mammon_ GAS is the GNU project port of the Unix AS assembler; it is available as part of the binutils package which is included with any of the GNU compilers (for example, GCC). GAS support is built into the various GNU compilers, and so GAS can be invoked by invoking the compiler on a .S (asm source) file; however it can also be run on any source file (for example, .asm files) by using the 'as' command. The GAS documentation is available on Linux installations in info (.gz) format, and is viewed using the command 'info as' or 'info -f as.info'. For the novice, a crash course in info: Info files are designed in a tree structure, with each page or section being considered a 'node'; h gets help, q quits info, SPACE scrolls down the screen, DEL scrolls up the screen, b jumps to the beginning of the node, e jumps to the end of the node, n jumps to the next node, p jumps to the previous node, g jumps to a specified node, m jumps to a specified menu item, s searches the info file, and l steps back 1 node. The sections of the most interest in the manual will be the Directives ('g Pseudo Ops'), Symbols ('g Symbols'), Constants ('g Constants'), and Sections ('g Sections') nodes. For more immediate references, the Intel 386- specific topics can be consulted: 'g i386-Syntax', 'g i386-Opcodes', 'g i386-Regs', 'g i386-prefixes', 'g i386-Memory', 'g i386-jumps'. The AT&T Syntax --------------- GAS uses the AT&T syntax, which is known to be confusing for those used to the Intel assembler syntax. It has been said that the AT&T syntax is less ambiguous than the Intel, and thus it has its own appeal. Registers One of the most obvious differences in syntax is that the registers in the AT&T syntax are prefixed with %. Thus, 'eax ax al ah' would be written '%eax %ax %al %ah' for GAS. Opcode Format and Order Unlike the Intel syntax which uses the format 'opcode dest, src', AT&T syntax uses the format 'opcode src, dest'; thus the command 'mov eax, ebx' in Intel would be 'mov %ebx, %eax' in AT&T. In addition, the opcodes in AT&T syntax all take suffixes to specify the size of the operand (note that these suffixes can be ignored usually, as GAS will guess the operand size by the size of the register being accessed)-- thus one would add 'w' to an opcode to specify a word operand, 'b' to specify a byte operand, and 'l' to specify a long operand. The Intel 'mov' opcode would then be specified in AT&T syntax by using 'movb', 'movw', or 'movl' as circumstances warrant. Note that this carries over into far calls; as the 'FAR" keyword is not present in GAS, one must prefix (not suffix) the call or jump with "l": thus a 'far call' becomes 'lcall', 'far jmp' becomes 'ljmp', and 'ret far' becomes 'lret'. Immediate and Absolute values Immediate values are prefixed with a $ in the AT&T syntax, while in the Intel syntax they are unmarked. Thus a 'push 4' statement becomes a 'push $4' in AT$T. Also, an absolute value is prefixed by a *, while in Intel it would be unmarked. Memory Referencing This is the part that is most likely to cause trouble for those used to the Intel syntax. Intel uses the following syntax for memory references: SECTION:[BASE + INDEX*SCALE + DISP] where BASE is the register used as a base in the reference, INDEX is a register used to calculate an offset, SCALE is the multiplier used to calculate the offset from the INDEX register, and DISP is the displacement from the BASE or INDEX register. Some examples from the GAS manual: [ebp - 4] [BASE DISP] (Note: DISP is -4) [foo + eax*4] [DISP + INDEX*SCALE] [foo] [DISP] (Value pointed to by 'foo') gs:foo SECTION:DISP (Contents of variable 'foo') AT&T syntax uses the following syntax for memoory references: SECTION:DISP(BASE, INDEX, SCALE) As with the Intel syntax, all of these are optional (and it appears that BASE and INDEX are rarely used together). The GAS manual provides the following examples equivalent to the above Intel examples: -4(%ebp) DISP(BASE) foo(,%eax,4) DISP(,INDEX,SCALE) foo(,1) DISP(,SCALE) (Note: the single comma is intentional) %gs:foo SECTION:DISP Note that you must provide commas within the parentheses whenever you skip an element (e.g., if you do not use BASE). To illustrate, here are some examples of memory references mixed in with asm opcodes (from http://www.castle.net/~avly/djasm.html): __AT&T______________________ __Intel_________________________ movl 4(%ebp), %eax mov eax, [ebp+4]) addl (%eax,%eax,4), %ecx add ecx, [eax + eax*4]) movb $4, %fs:(%eax) mov fs:eax, 4 movl _array(,%eax,4), %eax mov eax, [4*eax + array]) movw _array(%ebx,%eax,4), %cx mov cx, [ebx + 4*eax + array]) Labels & Symbols Labels in GAS are the same as in other assemblers: the name of the label followed by a colon. All symbol names must begin with a letter, a period, or an underscore. Local symbols are defined using the digits 0-9 followed by a colon, and are referred to using that digit followed by a b (for a backward reference) or f (for a forward reference); note that this allows only 10 local symbols. A symbol can be assigned a value using the equals sign (e.g. 'TRUE = 1') or by using the .set or .equ directives. Directives ---------- GAS allows most of the standard assembler directives; what follows are the most commonly used. .align Pad the section to a specified alignment (e.g. 4 bytes); this directive takes as an argument the alignment sized, as well as an optional argument specifying the byte used to fill the pad areas (default is 00). .ascii, .asciz, .string Each of these directives takes one or more strings separated by commas; in the .ascii directive, the strings are not terminated, in the .asciz and .string directives the strings are zero-terminated. .byte, .double, .int, .word Each of these directives takes as an argument an expression (for example, value1 + value2) and defines the specified number of bytes (byte, int, word, etc) at the current location to the result of the expression. .data, .section, .text The .section directive allows segments or sections of the target program to be defined for the linker. The .section directive takes a section name, as well as section flags (b = bss, w = writable, d = data, r = read-only, x = executable for COFF files; a = allocatable, w = writable, x = executable, @progbits = data, @nobits = no data for ELF files). The .data and .text directives are pre-defined .section directives for data and code sections. .equ, .set Each of these sets the first argument (a symbol) with the result of the second argument (an expression), for example .equ TRUE 1 sets the Symbol TRUE to the value 1. .extern The traditional EXTERN directive is available but ignored; GAS treats all undefined symbols as externs. .global, .globl These directives define global (exported) symbols; each takes as an argument the symbol to be made global. .if /.endif GAS provides the usual IF...ENDIF directives for conditional assembly; the .if directive is followed by an expression, and all code between the .if and the .endif directive is assembled only if that expression returns non-zero. .include This directive includes a file at the current location; it takes as an argument the name of the file in quotes, for example .include "stdio.inc" Assembling a Program -------------------- A GAS program can ge assembled by invoking GCC with the O2 (optimize: level 2) option. Note that all GAS programs must have a .text section and a global "main" label. Here is an example of a 'hello world'-style program in GAS: ; gashello.S ========================================================== .text message: .ascii "Helloooo, nurse!\0" .globl main main: pushl $message call puts popl %eax ret ; EOF ================================================================= This can be compiled with the command gcc -02 gashello.S -o ghello or with as gashello.S -o gashello.o ld -o gashello gashello.o -lc -s -defsym _start=main Note that it is much easier to use GCC than to use AS, as you will have to explicitly specify the librarys to link to (hence the -lc parameter) when you call LD. The Int80 "pid.asm" program from last month's Liux article would be written for GAS as follows: ;pid.S==================================================================== .global main .text szText1: .asciz "Getting Current Process ID..." szDone: .asciz "Done!" szError: .asciz "Error in int 80!" szOutput: .string "%d\n" main: pushl $szText1 call puts popl %ecx mov $20, %eax int $128 cmp $0, %eax je Error pushl %eax pushl $szOutput call printf popl %ecx popl %ecx pushl $szDone call puts jmp Exit Error: pushl $szError call puts Exit: popl %ecx ret ; EOF ==================================================================== This can be compiled in the same manner as the previous example; note, though, the need to use decimal numbers when calling interrupts (the 0x?? syntax for specifying a hexadecimal integer causes the opcode to not be recognized by the assembler). ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE A Guide to NASM for TASM Coders by Gij Generalities ------------ The basic function of any assembler it to turn asm into the equivalent binary code file; that's true for TASM, NASM, and any other assembler. The differences arise in the special features each assembler offers you. For example, the MODEL directive exists in TASM, making it easier for the coder to reference data variables in other segments. NASM does not have an equivalent directive, so you have to keep track of the segment registers yourself, and put segment overrides where they are needed. This does not mean that NASM doesn't have good SEGMENT or GROUP support; in fact it has both, though they are not quite the same as in TASM. It's a different way of coding, and it may seem to require more work, but after you get used to it it's easier, because you know exactly what's going on in your code. NASM actually gives you the closest possible idea of what your asm source code will become once it's compiled. TASM is chock-full of directives; looking at a small reference for TASM 4.0, there are at least a few dozen directives TASM uses, and you have to know quite a bit of them by heart. NASM on the other hand has very few directives. Actually, you can write an asm file that will assemble just fine without using a single directive, although I doubt it will be useful in most cases. NASM is also less ambivalent towards syntax, which leaves less room for software bugs, but makes it more strict when assembling. I actually think NASM is easier to learn then TASM since it's much more straight-forward. Your NASM Bible is of course the accompanying docs, you can get them in a separate package from the same place you got the binaries for NASM. All in all I think you will find NASM to be just as capable as TASM if not more so. Although it's missing some features TASM has, you can always mail the author and ask for a feature, and you just might get lucky when the new version comes out. ASM code is usually the same in any assembler ( AT&T syntax is an exception ) but there are a few subtleties that TASM coders should look out for. The docs that accompany NASM have a nice list of them, and I'll mention the most significant ones here. DATA offset vs DATA contents ---------------------------- TASM uses this syntax to move mov esi, offset MyVar OR lea esi, MyVar LEA is used to load complex offsets like "[esi*4+ebx]" into a register. TASM supports LEA even when used with a simple offset like "Myvar". NASM on the other hand only supports one way of loading a simple offset into a register (the LEA form is only valid when using complex offsets): mov esi, MyVar This ALWAYS means move the offest of MyVar into esi. On the other hand, This: mov eax, [MyVar] Will always mean move the contents of MyVar into eax. However, using LEA to load a complex offset is valid in both TASM and NASM: lea edi,[esi*4+EBX] ; valid in both assemblers NASM also support a SEG keyword: mov ax,SEG MyVar This moves the segment of the variable into ax. Segment Overrides ----------------- TASM is more lax in it's syntax, so both of these are valid code: mov ax,ds:[si] AND mov ax,[ds:si] NASM doesn't allow this--if you specify a variable inside the square brackets all of the specifiers should be inside the square brackets. So this is the only valid option: mov ax,[ds:si] Specifying operand size ----------------------- TASM coders usually have lexical difficulties with NASM because it lacks the "ptr" keyword used extensively in TASM. TASM uses this: mov al, byte ptr [ds:si] OR mov ax, word ptr [ds:si] OR mov eax, dword ptr [ds:si] For NASM This simply translates into: mov al, byte [ds:si] OR mov ax, word [ds:si] OR mov eax, dword [ds:si] NASM allows these size keywords in many places, and thus gives you a lot of control over the generated opcodes in a uniform way. For example, the following are all valid: push dword 123 jmp [ds: word 1234] ; these both specify the size of the offset jmp [ds: dword 1234] ; for tricky code when interfacing 32bit and ; 16bit segments It can get pretty hairy with operand size being this final, but the important thing to remember is you can have all the control you need, when you want it. Functions --------- TASM has special directives for declaring a procedure and ending it. Why? A procedure is just another code label you CALL instead of JMP--NASM got it right. TASM uses: ProcName PROC xor ax,ax ret ProcName ENDP while NASM just uses: Procname: xor ax,ax ret To declare a procedure PUBLIC, just use the GLOBAL directive: GLOBAL Procname Procname: xor ax,ax ret Local Labels ------------ Those of you that know C also know that a member of a struct can be referenced as StructInstance.MemberName. This is rather similar to the way NASM allows you to use local labels. A Local Label is denoted by prefixing a dot to the label name: Label1: nop .local: nop Label2: nop .local: nop This won't give you an error on multiple definitions of label, but you can still jmp to a certain label like this: jmp Label2.local ...so it's local, and in a way it's also a global label. ORG Directive -------------- NASM supports the org directive, so if you are coding a COM file you can start with: org 0x100h OR org 100h (NASM allows both the asm and c methods of specifying hex, so both of the above are valid.) Reserving Space --------------- Once again, here NASM uses a different syntax then that of TASM. In TASM you would declare a 100 bytes of uninitialized space like this: Array1: db 100 dup (?) NASM uses its own keywords to do this; these are RESB, RESW and RESD, standing for REServeByte, REServeWord, and REServeDword, respectively. To reserve 10 bytes, you would use RES? keywords like this: Array1: RESB 100 OR Array1: RESW 100/2 OR Array1: RESD 100/4 Declaring initialized space is much like TASM, but arrays are different. In TASM: Array1: db 100 dup (1) In NASM: Array1: TIMES 100 db 1 TIMES is a handy little directive, it instructs NASM to preform an action a specified number of times, in the example above I preform "db 1" a 100 times. TIMES can be used for virtually anything; for example: TIMES 69 nop will put 69 nops at the current point in the file. The $ (current location) symbol is supported by NASM, and can be used to specify the 'count' operand to TIMES, so this is valid: label1: mov ax,1 xor ax,ax label2: TIMES $-label1 nop This expands to TIMES (label2 - label1), and will put as many one-byte nops after label2, as the byte count between label1 and label2. Making Structs -------------- I fought long and hard to get structs going, the docs were a bit vague, and it took a while to get it, here it is. Using a struct is divided into 2 parts, declaring the prototype, and making an instance. A simple, 2-member structure would be defined as follows: struc st stLong resd 1 stWord resw 1 endstruc this declares a prototype struct named st, with 2 members, stLong which is a DWORD, and stWord which is a word. It uses the reserve directives because it's a prototype, not a real struct. You can use istruc to make a real instance that you can reference as data in your code: mystruc: istruc st at stLong, dd 1 at stWord, dw 1 iend *Note: it's important to put the label on a different line. This creates a struct named mystruc of type st; the "at" keyword is used to assign initial values to the members of the struc (i.e., at the reserverd bytes of memory). The notation for referencing members is not like in C. This is because of the way structures are implemented; in NASM, each member is assigned an offset relative to the beginning of the struct: mystruc: istruc st at stLong, dd 1 ; offset 0 at stWord, dw 1 ; offset 4 iend The notation for referencing a member is therefore: mov eax, [mystruc+stLong] This is because mystruc is a constant base, and the member is a relative offset to it. It's similar to referencing a data array. One thing I should mention: If you declare structs prototypes as above, the member names/labels will be global, so you will get collisions if you use the same member name in your code or in another struct prototype. To avoid this, precede the member names with a dot '.', and then reference them in relation to the prototype's name in the instance declaration. For example: struc st .stLong resd 1 .stWord resw 1 endstruc mystruc: istruc st at st.stLong, dd 1 at st.stWord, dw 1 iend And this is how you reference the members in code: mov eax,[mystruc+st.stWord] This may seem confusing; you should understand that "mystruc" is the base of a particular instance, and "st.stLong" is an offset relative to the start of the struct, so in pseudo-code it translates into: mov eax,[offset mystruc + (offset stWord-offset start_of_proto] or mov eax,[offset mystruc + 4] ...which gives you the correct offset for the stWord member of the "mystruc" struct instance. Using Macros ------------ This is a large part of the nasm docs, and a bit too much to get into in depth here. I'll try and cover the major issues. There are 2 types of macros, one-line and multi-line, all macro keywords are preceeded with a '%' character. An example of a single-line macro: %define mul(a,b) (a*b) ...which would be reference in the source code as follows: mov eax,mul(2,3) This will be converted into: mov eax,6 You can invoke other macros from within a macro: %define fancymul(a,b) ( a * triple_mul(4) ) %define triple_mul(a) (a*3) mov eax,fancymul(2,3) This becomes: mov eax, ( 2 * ( 3 * 4 ) ) These are not very useful examples, but i'm sure you can see the potential. Multi-Line macros are much the same as single-line macros, but the syntax is a bit different: %macro name number_of_args
%endmacro So, for example, if you wanted to make a small asm effort-saver you could write the following macro: %macro prologue 1 push ebp mov ebp,esp sub esp,%1 %endmacro ...and then you can use it in your code like this: DemoFunc: prologue 4*2 This would set up a stack frame and reserve room for 2 DWORD local variables. You'll notice that args supplied to the macro can be referenced as %1....%n, similar to DOS and Unix shell/batch programming. This is just a quick taste, there's more to be learned about NASM macros: the docs are your friends. Includes -------- Including files is easy, If you want to include .inc's into your asm file you can use: %include "win32.inc" If you wish to include binary files, you must use a different keyword: INCBIN "data.bin" Conditional Assembly -------------------- NASM also has support for conditional assembly: %define INCLUDE_WIN32_INCS %ifdef INCLUDE_WIN32_INCS %include "win32.inc" %include "toolhelp.inc" %include "messages.inc" %endif This way you can control the inclusion of files defining on the command line: "nasmw -dINCLUDE_WIN32_INC" or by commenting out the %define line. The body of the %ifdef will be processed only if a macro/define named INCLUDE_WIN32_INCS is defined. Externs, Globals and Commons ----------------------------- When Coding a multi-source-files project, writing a dll, or calling API functions you need to declare various symbols/data/functions a certain type to make them available to the Assembler and you. There are 3 types of symbols in NASM: EXTERN, GLOBAL and COMMON. Their invocation is all the same: EXTERN symbol_name ; use this to define API calls for use GLOBAL symbol_name COMMON symbol_name They all must appear before the actual symbol is defined/referenced. If you have experience in asm/c, their use should be clear -- EXTERN declares an external reference ofr the linker to resolve (an "import"), GLOBAL declares a symbol to be globally/publicly available (an "export"), and COMMON declares a variable to be of Common data type (i.e., all instances of a COMMON variable are merged into a single instance during compilation). NASM 0.97 also has IMPORT/EXPORT extensions to the .obj format, for writing DLL's; read the docs for more info. Specifying Segment Type ----------------------- You can declare segments much the same as you would in TASM: segment .data use32 CLASS=data or segment .text use32 CLASS=code or segment Gij use16 CLASS=code This is a good way to set segments straight for linking. Note that Nasm does not require certain segments to be present: you have full control over the segmentation of the program. Output Formats -------------- Nasm supports a plethora of output formats; depending on what you are trying to accomplish, you should read the docs for special extensions to each type. The output format is chosen using "nasm -f type" on the command line, where type can be bin, obj, win32 and others. Each linker likes different formats--tlink likes obj for example, while LCC-WIN32 likes the win32 format...investigate on your own to find the best output format for your linker. *tip: when assembling into the "obj" type, make sure and use the special "..start:" symbol to specify the entry point for the file. In Closing ---------- That's all for now. This is intended to be a 'quick-start' guide for TASM coders who want --or need-- to move into NASM; it is not a substitute for the NASM documentation. If you need to reach me, my e-mail is gij