By Andrew Davie (adapted by Duane Alan Hahn)
Table of Contents
We've had a brief introduction to DASM, and in particular mnemonics (6502 instructions, written in human-readable format) and symbols (other words in our program which are converted by DASM into a numeric form in the binary).
Now we're going to have a brief look at how DASM uses the symbols (and in particular the value for symbols it calculates and stores in its internal symbol table) to build up the binary ROM image.
Each symbol the assembler finds in our source code must be defined (given an actual value) in at least one place in the code. A value is given to a symbol when it appears in our code starting in the very first column of a line. Symbols typically cannot be redefined (given another value).
In an earlier session we examined how the code 'sta WSYNC' appeared in our binary file as $85 $02 (remember, we examined the listing file to see what bytes appeared in our binary. At that point, I indicated that the assembler had determined the value of the symbol 'WSYNC' was 2 (corresponding to the TIA register's memory address)—through its definition in the standard vcs.h file.
But how does the assembler actually determine the value of a symbol?
The answer is that the symbol must be defined somewhere in the source code (as opposed to just being referenced). Definition of a symbol can come in several forms. The most straightforward is to just assign a value. . .
WSYNC = 2
or. . .
WSYNC EQU 2
The above examples are equivalent—DASM supports syntax (style) which has become fairly standard over the years. Some people (me!) like to use the = symbol, and some like to use EQU. Note that the symbol in question must start in the very first column, when it is being defined. In both cases, the value 2 is being assigned to the symbol WSYNC. Wherever DASM encounters the symbol WSYNC in the code, it knows to use the value 2.
That's fairly straightforward stuff. But symbols can be defined in terms of other symbols! Also, DASM has a quite capable ability to understand expressions, so the following is quite valid. . .
AFTER_WSYNC = WSYNC + 1
In this case, the symbol 'AFTER_WSYNC' would have the value 3. Even if the WSYNC label was defined after the above code, the assembler would successfully be able to resolve the AFTER_WSYNC value, as it does multiple passes through the code until symbols are all resolved.
Symbols can also be given values automatically by the assembler. Consider our sample kernel where we see the following code near the start (here we're looking at the listing file, so we can see the address information DASM outputs). . .
10 0000 ???? SEG 11 f000 ORG $F000 12 f000 13 f000 Reset 14 f000 15 f000 16 f000 17 f000 18 f000 19 f000 20 f000 StartOfFrame 21 f000 22 f000 ; Start of vertical blank processing 23 f000 24 f000 a9 00 lda #0 25 f002 85 01 sta VBLANK
'Reset' and 'StartOfFrame' are two symbols which are definitions at this point because they both start at the first column of the lines they are on. The assembler assigns the current ROM address to these symbols, as they occur. That is, if we look at these 'labels' (=symbols) in the symbol table, we see. . .
StartOfFrame f000 (R ) Reset f000 (R )
They both have a value of $F000. This form of symbol (which starts at the beginning of a line, but is not explicitly assigned a value) is called a label, and refers to a location in the code (or more particularly an address). How and why did DASM assign the value $F000 to these two labels, in this case?
As the assembler converts your source code to a binary format, it keeps an internal counter telling it where in the address space the next byte is to be placed. This address increments by the appropriate amount for each bit of data it encounters. For example, if we had a 'nop' (a 1-byte instruction), then the address counter that DASM maintains would increment by 1 (the length of the nop instruction). Whenever a label is encountered, the label is given the value of the current internal address counter at the point in the binary image at which the label occurs. The label itself does not go into the binary—but the value of the label refers to the address in the binary corresponding to the position of the label in the source code.
In the above code snippet, we can see the address in column 2 of the output, and it starts at 0 (with ???? after it, indicating it doesn't actually KNOW the internal counter/address at this point), and (here's the bit I really want you to understand) it is set to $F000 when we get the 'org $F000' line. 'Org' stands for origin, and this is the way we (the programmer) indicate to the assembler the starting address of next section of code in the binary ROM. Just to complicate things slightly, it is not the actual offset from the start of the ROM (for a ROM might, for example, be only 4K but contain code assembled to live at $F000-$FFFF—as in a 4K cartridge). So it's not an offset, it's a conceptual address.
These labels are very useful to programmers to give a name to a point in code, so that point may be referred to by the label, instead of us having to know the address. If we look at the end of our sample kernel, we see. . .
70 f3ea 4c 00 f0 jmp StartOfFrame
The 'jmp' is the mnemonic for the jump instruction, which transfers flow of control to the address given in the two byte operand. In other words, it's a GOTO statement. Look carefully at the binary numbers inserted into the ROM (again, the columns are left to right, line number, address, byte(s), source code). We see $4C, 0, $f0. The opcode for JMP is $4C—whenever the 6502 fetches this instruction, it forms a 16-bit address from the next two bytes (0,$F0) and code continues from that address. Note that the 'StartOfFrame' symbol/label has a value $F000 in our symbol table.
It's time to understand how 16-bit numbers are formed from two 8-bit numbers, and how 0, $F0 translates to $F000. The 6502, as noted, can address 2^16 bytes of memory. This requires 16 bits. The 6502 itself is only capable of manipulating 8-bit numbers. So 16-bit numbers are stored as pairs of bytes. Consider any 16-bit address in hexadecimal—$F000 is convenient enough. The binary value for that is %1111000000000000. Divide it into two 8-bit sections (equivalent to 2 bytes) and you get %11110000 and %00000000—equivalent to $F0 and 0. Note, any two hex digits make up a byte, as hex digits require 4 bits each (0-15 or %0000-%1111). So we could just split any hex address in half to give us two 8-bit bytes. As noted, 6502 manipulates 16-bit addresses through the use of two bytes. These bytes are generally always stored in ROM in little-endian format (that is, the lowest significant byte first, followed by the high byte). So $F000 hex is stored as 0, $F0 (the low byte of $F000 followed by the high byte).
Now the binary of our jmp instruction should make sense. Opcode ($4C), 16-bit address in low/high format ($F000). When this instruction executes, the program jumps to and continues executing from address $F000 in ROM. And we can see how DASM has used its symbol table—and in particular the value it calculated from the internal address counter when the StartOfFrame label was defined—to 'fill in' the correct low/hi value into the binary file itself where the label was actually referred to.
This is typical of symbol usage. DASM uses its internal symbol table to give it a value for any symbol it needs. Those values are used to create the correct numbers for the ROM/binary image.
Let's go back to our magical discovery that the 'org' instruction is just a command to the assembler (it does not appear in the binary) to let the assembler know the value of the internal address counter at that point in the code. It is quite legal to have more than one ORG command in our source. In fact, our sample kernel uses this when it defines the interrupt vectors. . .
70 f3ea 4c 00 f0 jmp StartOfFrame 71 f3ed 72 f3ed 73 fffa ORG $FFFA 74 fffa 75 fffa 00 f0 .word.w Reset; NMI 76 fffc 00 f0 .word.w Reset; RESET 77 fffe 00 f0 .word.w Reset; IRQ
Here we can see that after the jmp instruction, the internal address counter is at $F3ED, and we have another ORG which sets the address to $FFFA (the start of the standard 6502 interrupt vector data). Astute readers will notice the use of the label 'Reset' in three lines, with the binary value $F000 (if the numbers are to be interpreted as a low/high byte pair) appearing in the ROM image at address $FFFA, $FFFC, $FFFE. We briefly discussed how the 6502 looks at the address $FFFC to give it the address at which it should start running code. Here we see that this address points to the label 'Reset'. Magic.
It's quite legal to use one symbol as the value for an ORG command. Here's a short snippet of code which should clarify this. . .
START = $F800; start of code - change this if you want ORG START HelloWorld
In the above example, the label HelloWorld would have a value of $F800. If the value of START were to change, so would the value of HelloWorld.
We've seen how the ORG command is used to tell DASM where to place bits of code (in terms of the address of code in our ROM). This command can also be used to define our variables in RAM. We haven't had a play with RAM/variables yet, and it will be a few sessions before we tackle that—but if you want a sneak peek, have a look at vcs.h and see how it defines its variables from an origin defined as 'ORG TIA_BASE_ADDRESS'. That code is way more complex than our current level of understanding, but it gives some idea of the versatility of the assembler.
We're almost done with the basic commands inserted into our source code to assist DASM's building of the binary image. Now you should understand how symbols are assigned values (either by their explicit assignation of a value, or by implicit address/location value)—and how those values—through the assembler's internal symbol table—are used to put the correct number into the ROM binary image. We also understand that DASM converts mnemonics (6502 commands in human-readable form) directly into opcodes. There's not much more to actual assembly—so we shall soon move on to actual 6502 code, and playing with the TIA itself.
Other Assembly Language Tutorials
Session 10: Orgasm
This book was written in English, not computerese. It's written for Atari users, not for professional programmers (though they might find it useful).
This book only assumes a working knowledge of BASIC. It was designed to speak directly to the amateur programmer, the part-time computerist. It should help you make the transition from BASIC to machine language with relative ease.
The 6502 Instruction Set broken down into 6 groups.
Nice, simple instruction set in little boxes (not made out of ticky-tacky).
This book shows how to put together a large machine language program. All of the fundamentals were covered in Machine Language for Beginners. What remains is to put the rules to use by constructing a working program, to take the theory into the field and show how machine language is done.
An easy-to-read page from The Second Book Of Machine Language.
A useful page from Assembly Language Programming for the Atari Computers.
Continually strives to remain the largest and most complete source for 6502-related information in the world.
By John Pickens. Updated by Bruce Clark.
Below are direct links to the most important pages.
Goes over each of the internal registers and their use.
Gives a summary of whole instruction set.
Describes each of the 6502 memory addressing modes.
Describes the complete instruction set in detail.
Cycle counting is an important aspect of Atari 2600 programming. It makes possible the positioning of sprites, the drawing of six-digit scores, non-mirrored playfield graphics and many other cool TIA tricks that keep every game from looking like Combat.
Atari 2600 programming is different from any other kind of programming in many ways. Just one of these ways is the flow of the program.
The "bankswitching bible." Also check out the Atari 2600 Fun Facts and Information Guide and this post about bankswitching by SeaGtGruff at AtariAge.
Atari 2600 programming specs (HTML version).
Links to useful information, tools, source code, and documentation.
Atari 2600 programming site based on Garon's "The Dig," which is now dead.
Includes interactive color charts, an NTSC/PAL color conversion tool, and Atari 2600 color compatibility tools that can help you quickly find colors that go great together.
Adapted information and charts related to Atari 2600 music and sound.
A guide and a check list for finished carts.
A multi-platform Atari 2600 VCS emulator. It has a built-in debugger to help you with your works in progress or you can use it to study classic games.
A very good emulator that can also be embedded on your own web site so people can play the games you make online. It's much better than JStella.
If assembly language seems a little too hard, don't worry. You can always try to make Atari 2600 games the faster, easier way with batari Basic.
View this page and any external web sites at your own risk. I am not responsible for any possible spiritual, emotional, physical, financial or any other damage to you, your friends, family, ancestors, or descendants in the past, present, or future, living or dead, in this dimension or any other.
Use any example programs at your own risk. I am not responsible if they blow up your computer or melt your Atari 2600. Use assembly language at your own risk. I am not responsible if assembly language makes you cry or gives you brain damage.