By Andrew Davie (adapted by Duane Alan Hahn, a.k.a. Random Terrain)
As an Amazon Associate I earn from qualifying purchases.
Page Table of Contents
Original Session
We've had a brief introduction to DASM, and in particular mnemonics (6502 instructions, written in human-readable format) and symbols (other words in our program which are converted by DASM into a numeric form in the binary).
Now we're going to have a brief look at how DASM uses the symbols (and in particular the value for symbols it calculates and stores in its internal symbol table) to build up the binary ROM image.
Each symbol the assembler finds in our source code must be defined (given an actual value) in at least one place in the code. A value is given to a symbol when it appears in our code starting in the very first column of a line. Symbols typically cannot be redefined (given another value).
In an earlier session we examined how the code 'sta WSYNC' appeared in our binary file as $85 $02 (remember, we examined the listing file to see what bytes appeared in our binary. At that point, I indicated that the assembler had determined the value of the symbol 'WSYNC' was 2 (corresponding to the TIA register's memory address)—through its definition in the standard vcs.h file.
But how does the assembler actually determine the value of a symbol?
The answer is that the symbol must be defined somewhere in the source code (as opposed to just being referenced). Definition of a symbol can come in several forms. The most straightforward is to just assign a value…
WSYNC = 2
or…
WSYNC EQU 2
The above examples are equivalent—DASM supports syntax (style) which has become fairly standard over the years. Some people (me!) like to use the = symbol, and some like to use EQU. Note that the symbol in question must start in the very first column, when it is being defined. In both cases, the value 2 is being assigned to the symbol WSYNC. Wherever DASM encounters the symbol WSYNC in the code, it knows to use the value 2.
That's fairly straightforward stuff. But symbols can be defined in terms of other symbols! Also, DASM has a quite capable ability to understand expressions, so the following is quite valid…
AFTER_WSYNC = WSYNC + 1
In this case, the symbol 'AFTER_WSYNC' would have the value 3. Even if the WSYNC label was defined after the above code, the assembler would successfully be able to resolve the AFTER_WSYNC value, as it does multiple passes through the code until symbols are all resolved.
Symbols can also be given values automatically by the assembler. Consider our sample kernel where we see the following code near the start (here we're looking at the listing file, so we can see the address information DASM outputs)…
10 0000 ???? SEG
11 f000 ORG $F000
12 f000
13 f000 Reset
14 f000
15 f000
16 f000
17 f000
18 f000
19 f000
20 f000 StartOfFrame
21 f000
22 f000 ; Start of vertical blank processing
23 f000
24 f000 a9 00 lda #0
25 f002 85 01 sta VBLANK
'Reset' and 'StartOfFrame' are two symbols which are definitions at this point because they both start at the first column of the lines they are on. The assembler assigns the current ROM address to these symbols, as they occur. That is, if we look at these 'labels' (=symbols) in the symbol table, we see…
StartOfFrame f000 (R ) Reset f000 (R )
They both have a value of $F000. This form of symbol (which starts at the beginning of a line, but is not explicitly assigned a value) is called a label, and refers to a location in the code (or more particularly an address). How and why did DASM assign the value $F000 to these two labels, in this case?
As the assembler converts your source code to a binary format, it keeps an internal counter telling it where in the address space the next byte is to be placed. This address increments by the appropriate amount for each bit of data it encounters. For example, if we had a 'nop' (a 1-byte instruction), then the address counter that DASM maintains would increment by 1 (the length of the nop instruction). Whenever a label is encountered, the label is given the value of the current internal address counter at the point in the binary image at which the label occurs. The label itself does not go into the binary—but the value of the label refers to the address in the binary corresponding to the position of the label in the source code.
In the above code snippet, we can see the address in column 2 of the output, and it starts at 0 (with ???? after it, indicating it doesn't actually KNOW the internal counter/address at this point), and (here's the bit I really want you to understand) it is set to $F000 when we get the 'org $F000' line. 'Org' stands for origin, and this is the way we (the programmer) indicate to the assembler the starting address of next section of code in the binary ROM. Just to complicate things slightly, it is not the actual offset from the start of the ROM (for a ROM might, for example, be only 4K but contain code assembled to live at $F000-$FFFF—as in a 4K cartridge). So it's not an offset, it's a conceptual address.
These labels are very useful to programmers to give a name to a point in code, so that point may be referred to by the label, instead of us having to know the address. If we look at the end of our sample kernel, we see…
70 f3ea 4c 00 f0 jmp StartOfFrame
The 'jmp' is the mnemonic for the jump instruction, which transfers flow of control to the address given in the two byte operand. In other words, it's a GOTO statement. Look carefully at the binary numbers inserted into the ROM (again, the columns are left to right, line number, address, byte(s), source code). We see $4C, 0, $f0. The opcode for JMP is $4C—whenever the 6502 fetches this instruction, it forms a 16-bit address from the next two bytes (0,$F0) and code continues from that address. Note that the 'StartOfFrame' symbol/label has a value $F000 in our symbol table.
It's time to understand how 16-bit numbers are formed from two 8-bit numbers, and how 0, $F0 translates to $F000. The 6502, as noted, can address 2^16 bytes of memory. This requires 16 bits. The 6502 itself is only capable of manipulating 8-bit numbers. So 16-bit numbers are stored as pairs of bytes. Consider any 16-bit address in hexadecimal—$F000 is convenient enough. The binary value for that is %1111000000000000. Divide it into two 8-bit sections (equivalent to 2 bytes) and you get %11110000 and %00000000—equivalent to $F0 and 0. Note, any two hex digits make up a byte, as hex digits require 4 bits each (0-15 or %0000-%1111). So we could just split any hex address in half to give us two 8-bit bytes. As noted, 6502 manipulates 16-bit addresses through the use of two bytes. These bytes are generally always stored in ROM in little-endian format (that is, the lowest significant byte first, followed by the high byte). So $F000 hex is stored as 0, $F0 (the low byte of $F000 followed by the high byte).
Now the binary of our jmp instruction should make sense. Opcode ($4C), 16-bit address in low/high format ($F000). When this instruction executes, the program jumps to and continues executing from address $F000 in ROM. And we can see how DASM has used its symbol table—and in particular the value it calculated from the internal address counter when the StartOfFrame label was defined—to 'fill in' the correct low/hi value into the binary file itself where the label was actually referred to.
This is typical of symbol usage. DASM uses its internal symbol table to give it a value for any symbol it needs. Those values are used to create the correct numbers for the ROM/binary image.
Let's go back to our magical discovery that the 'org' instruction is just a command to the assembler (it does not appear in the binary) to let the assembler know the value of the internal address counter at that point in the code. It is quite legal to have more than one ORG command in our source. In fact, our sample kernel uses this when it defines the interrupt vectors…
70 f3ea 4c 00 f0 jmp StartOfFrame
71 f3ed
72 f3ed
73 fffa ORG $FFFA
74 fffa
75 fffa 00 f0 .word.w Reset; NMI
76 fffc 00 f0 .word.w Reset; RESET
77 fffe 00 f0 .word.w Reset; IRQ
Here we can see that after the jmp instruction, the internal address counter is at $F3ED, and we have another ORG which sets the address to $FFFA (the start of the standard 6502 interrupt vector data). Astute readers will notice the use of the label 'Reset' in three lines, with the binary value $F000 (if the numbers are to be interpreted as a low/high byte pair) appearing in the ROM image at address $FFFA, $FFFC, $FFFE. We briefly discussed how the 6502 looks at the address $FFFC to give it the address at which it should start running code. Here we see that this address points to the label 'Reset'. Magic.
It's quite legal to use one symbol as the value for an ORG command. Here's a short snippet of code which should clarify this…
START = $F800; start of code - change this if you want ORG START HelloWorld
In the above example, the label HelloWorld would have a value of $F800. If the value of START were to change, so would the value of HelloWorld.
We've seen how the ORG command is used to tell DASM where to place bits of code (in terms of the address of code in our ROM). This command can also be used to define our variables in RAM. We haven't had a play with RAM/variables yet, and it will be a few sessions before we tackle that—but if you want a sneak peek, have a look at vcs.h and see how it defines its variables from an origin defined as 'ORG TIA_BASE_ADDRESS'. That code is way more complex than our current level of understanding, but it gives some idea of the versatility of the assembler.
We're almost done with the basic commands inserted into our source code to assist DASM's building of the binary image. Now you should understand how symbols are assigned values (either by their explicit assignation of a value, or by implicit address/location value)—and how those values—through the assembler's internal symbol table—are used to put the correct number into the ROM binary image. We also understand that DASM converts mnemonics (6502 commands in human-readable form) directly into opcodes. There's not much more to actual assembly—so we shall soon move on to actual 6502 code, and playing with the TIA itself.
Other Assembly Language Tutorials
Be sure to check out the other assembly language tutorials and the general programming pages on this web site.
Amazon: Atari 2600 Programming (#ad)
Amazon: 6502 Assembly Language Programming (#ad)
Atari 2600 Programming for Newbies (#ad)
|
|
|
Session 2: Television Display Basics
Sessions 3 & 6: The TIA and the 6502
Session 5: Memory Architecture
Session 7: The TV and our Kernel
Session 9: 6502 and DASM - Assembling the Basics
Session 10: Orgasm
Session 14: Playfield Weirdness
Session 15: Playfield Continued
Session 16: Letting the Assembler do the Work
Sessions 17 & 18: Asymmetrical Playfields (Parts 1 & 2)
Session 20: Asymmetrical Playfields (Part 3)
Session 22: Sprites, Horizontal Positioning (Part 1)
Session 22: Sprites, Horizontal Positioning (Part 2)
Session 23: Moving Sprites Vertically
Session 25: Advanced Timeslicing
Disclaimer
View this page and any external web sites at your own risk. I am not responsible for any possible spiritual, emotional, physical, financial or any other damage to you, your friends, family, ancestors, or descendants in the past, present, or future, living or dead, in this dimension or any other.
Use any example programs at your own risk. I am not responsible if they blow up your computer or melt your Atari 2600. Use assembly language at your own risk. I am not responsible if assembly language makes you cry or gives you brain damage.