By Andrew Davie (adapted by Duane Alan Hahn)
Table of Contents
The RESPx registers for each of the sprites are strobe registers which effectively set the x position of each sprite to the point on the scanline the TIA is displaying when those registers are written to. Put more simply, as soon as you write to RESP0, sprite 0 begins drawing and it will keep drawing in that position on every scanline. Same for RESP1.
This session we're going to have a bit of a play with horizontal positioning code, and perhaps come to understand why even the simplest things on the '2600 are still an enjoyable challenge even to experienced programmers.
As previously noted, it is not possible to just tell the '2600 the x position at which you want your sprites to display. The x positioning of the sprites is a consequence of an internal (non-accessible) timer which triggers sprite display at the same point every scanline. You can reset the timer by writing to RESP0 for sprite 0 or RESP1 for sprite 1. And based on where on the scanline you reset the timer, you effectively reposition the sprite to that position.
The challenge for us this session is to develop code which can position a sprite to any one of the 160 pixels on the scanline!
Given any pixel position from 0 to 159, how would we go about 'moving' the sprite to that horizontal position? Well, as we now know, we can't do that. What we can do is wait until the correct pixel position and then hit a RESPx register. Once we've done that, the sprite will start drawing immediately. So if we delay until, say, TIA pixel 80—and then hit RESP0, then at that point the sprite 0 would begin display. Likewise, for any pixel position on the scanline, if we delay to that pixel and then hit RESP0, the sprite 0 will display at the pixel where we did that.
So how do we delay to a particular pixel? It's not as easy as it sounds! What we have to do, it turns out, is keep a track of the exact execution time (cycle count) of instructions being executed by the 6502 and hit that RESPx register only at the right time. But it gets ugly—because as we know, although there are 228 TIA color clocks on each scanline (160 of those being visible pixels), these correspond to only 76 cycles (228/3) of 6502 processing time. Consequently only 160/3 = 53 and 1/3 cycles of 6502 time in the visible part of the scanline. Since each 6502 cycle corresponds to 3 TIA clocks, it would seem that the best precision with which we could hit RESPx is within 3 pixels. But it gets uglier still, and we'll soon see why.
The SLEEP macro has been useful to us now, to delay a set number of 6502 cycles. Consider the following code. . .
sta WSYNC ; wait till start of line SLEEP 40 ; 20 cycle delay sta RESP0 ; reset sprite 0 position
Surely that's a simple and neat way to position the sprite to TIA color-clock 120? The 120 comes from calculating the 6502 cycle number (40) x 3 TIA color clocks per 6502 cycle. The answer to the question is "yes and no". Sure, it's a neat way to hardwire a specific delay to a specific position. But say you wanted to be able to adjust the position to an arbitrary spot. We could no longer use this sort of code. Remember, SLEEP is just a macro. What it does is insert code to achieve the number of cycles delay you request. The above might look something more like this. . .
sta WSYNC nop ; 2 cycles nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 sta RESP0
We don't really know what the sleep macro inserts, and we don't really care. It's documented to cause a delay of n cycles, if you pass it n. That's all we can know about it. If we wanted to change n to n+1 we could do it at compile time, but we couldn't use this sort of code for realtime changes of the delay. What we want is a bit of code which will wait a variable bit of time.
And here's where the fun really starts! There are, of course, many many ways to do this. And part of the fun of horizontal positioning code is that it's just begging for nifty and elegant solutions to doing just that. What we're going to do now is just develop a fairly simple, possibly inefficient, but workable solution.
The essence of our solution will be to use a loop to count down the delay, and when the loop terminates immediately write the RESPx register. So the longer the delay, the more our loop iterates. In principle, it's a fine idea. In practice we soon see the severe limitations. We should be familiar with simple looping constructs—we have already used looping to count the scanlines in our kernels, for example. Here's a simple delay loop which will iterate exactly the number of times specified in the X register. . .
; assume X holds a delay loop count SimpleLoop dex bne SimpleLoop sta RESP0 ; now reset sprite position
That's as simple a loop as we can get. Each iteration through the loop the value in the X register is decremented by one, and the loop will continue until the Z flag is set (which happens when the value of the last operation performed by the processor returned a zero result—in this case, the last operation would be the 'dex' instruction). So as you can see, at just two instructions in size this is a pretty 'tight' loop. There's not much you can trim out of it and still have a loop! So what's the problem with using a loop like this in our horizontal positioning code? Let's have another look at this, but with cycle times added. . .
SimpleLoop dex ; 2 bne SimpleLoop ; 3 (2)
It has been fairly standard notation for a few years now to indicate cycle times in the fashion shown above. The number in the comment (after each semicolon) represents the number of 6502 cycles required to execute the instruction on that line. In this case, the 'dex' instruction takes 2 cycles. The 'bne' instruction takes 3 cycles (if the branch is taken) and 2 cycles if not. Unfortunately, life isn't always that simple. If the branch from the bne instruction to the actual branch location crossed over a page (a 256-byte boundary), then the processor takes another cycle! So we're faced with the situation where, as we add and remove code to other parts of our program, some of our loops take longer or shorter amounts of time to execute. No kidding! So when we come to doing tightly timed loops where timing is critical, we must also remember to somehow guarantee that this sort of shifting doesn't happen! That's not our problem today, though—let's assume that our branches are always within the same page.
So what's wrong with the above? Let's go back to our correspondence between 6502 cycles and TIA color clocks. We know that each 6502 cycle is 3 TIA color clocks. So a single iteration of the above loop would take 5 cycles of 6502 time—or a massive 15 TIA color clocks. No matter what number of iterations of our loop we do, we can only hit the RESPx register with a finesse of 15 TIA color clocks! Is this a disaster? No, it's not. In fact, the TIA is specifically designed to cater for this situation. Before we delve into how, though, let's analyze this loop a bit more. . .
Since each iteration of the loop chews 15 TIA color clocks, we must iterate (x/15) times, where X is the pixel number where we want our sprite to be positioned. Put another way, we need to know how many 15-pixel chunks to skip in our delay looping before we're at the correct position to hit RESPx and start sprite display. So when we come into this code with a desired horizontal position, we'll have to divide that value by 15 to give us a loop count. What's the divide instruction? There isn't one, of course!
So how do we divide by 15?
Another of those extremely enjoyable challenges of '2600 programming. Dividing by a power of 2 is easy. The processor provides shifting instructions which shift all the bits in a byte to the left or to the right. Consider in decimal, if you shifted all digits of a number to the left by one place, and added a 0 at the end of the number, you'd have multiplied by 10. Similarly in binary, if you shift a number left once, and put a 0 on the end, you've multiplied by 2. Dividing by two is thus shifting to the right one digit position, and adding a 0 at the 'top' of the number. Typically, multiplication in particular and sometimes division are achieved by clever combination of shifting and adding numbers.
But we don't need to do that here. We know that there are only 160 possible positions for the sprite. Why not have a 160 byte table, with each entry giving the loop counter for the delay loop for each position? Something like this. . .
Divide15 .POS SET 0 REPEAT 160 .byte .POS / 15 .POS SET .POS + 1 REPEND
DON'T do things by hand when the assembler can do it for you! What I've done here is write a little 'program' to control the assembler generation of a table of data. It has a repeat loop of 160 iterations, each iteration incrementing a counter by one and putting that counter value / 15 in the ROM (with the .byte pseudo-op). This code is equivalent to writing. . .
Divide15 .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 15 entries .byte 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1; 15 entries ; etc. . . lots more. . .
Me, I'd prefer the first example—easier to maintain and modify.
In any case, the idea of having a table is to give us a quick and easy way to divide by 15. To use it, we place our number in an index register, then load the divide by 15 result from the table, using the register to give us the offset into the table. Easier to show than explain..
ldx Xposition lda Divide15,x ; xPosition / 15 tax SimpleLoop dex bne SimpleLoop sta RESP0 ; start drawing the sprite
It's good, and it's bad. Bad because it can't cope with 'loop 0 times'—in fact, it will loop 256 times. So let's add one to all the entries in the table, which will 'fix' this problem. Just change the '.byte .POS / 15' to '.byte (.POS / 15) + 1'. But I think we're digressing, and what I really wanted to introduce was the concept of looping to delay for a certain (variable) time, and then hitting RESPx at the end of the loop. You can see the problems introduced by this method, though, where we had to find a way to divide by 15, where we only had 15 color clock resolution in our positioning. There are other—and arguably better—ways to do horizontal positioning, but let's not make the better the enemy of the good. What we're really after right now is a working solution.
So in theory, our positioning code so far consists of dividing the x position by 15, looping (skipping 15 color clocks each loop) and then hitting the RESP0 register to start drawing the sprite. Is this all there is to it? Yes, in a nutshell. But the devil is in the detail. Let's integrate what we have so far into a kernel which constantly increments the desired X position for the sprite, then attempts to set the x position for the sprite each frame (see the source code and sample binary).
Now this is very interesting. Clearly our sprite is moving across the screen as our desired position is incrementing. But it's moving in very big chunks. We have a bit of optimizing to do before we have a sprite positioning system capable of pixel-precise horizontal positioning. But it's a start, and we understand it (I hope!).
There are some observations to make about this code and binary. I've introduced a little more 6502, which we can examine now. . .
inc SpriteXPosition; increment the desired position by 1 pixel ldx SpriteXPosition cpx #160 ; has it reached 160? bcc LT160 ; this is equivalent to branch if less than ldx #0 ; otherwise reload with 0 stx SpriteXPosition LT160 jsr PositionSprite; call the subroutine to position the sprite
This is the bit of code which does the adjustment of the desired position, loads it to the x register and calls a 'subroutine' to do the actual positioning code. This is our first introduction to the 'bcc' instruction, and to the 'jsr' and 'rts' (in the subroutine itself) instructions. We have previously encountered the Z flag and the use of flags in the processor's status register to determine if branches are taken or not. The delay loop uses exactly this. The Z flag isn't the only flag set or cleared when operations are performed by the processor. Sometimes the 'carry flag' is also set or cleared. Specifically, when arithmetic operations such as addition and subtraction, and also when comparisons are done (which are essentially achieved by doing an actual addition or subtraction but not storing the result to the register). In this case, we've compared the x register with the value 160 (cpx #160). This will clear the carry flag if the x register is LESS than 160, or set the carry flag if the X register is GREATER than or EQUAL to 160. I've always used the carry flag like this for unsigned comparisons. In the code above, we're saying 'if the x register is >= 160, then reset it to 0'. All branch instructions cost 3 cycles if taken, two if not taken, and an additional cycle if the branch taken crosses a page boundary. Branches can only be made to code within -128 or +127 bytes from the branch. For longer 'jumps' one can use the 'jmp' instruction, which is unconditional.
For long conditional branches, use this sort of code. . .
cpx #160 bcs GT160 ; NOT less than 160 (bcs is a GREATER or EQUAL comparison) jmp TooFarForLT ; IS less than 160 GT160 ; lots of code TooFarForLT; etc
But I digress! The 'jsr' instruction mnemonic stands for "Jump Subroutine". A subroutine is a small section of code somewhere in your program which can be 'called' to do a task, and then have program execution continue from where the call was made. Subroutines are useful to encapsulate often-used code so that it doesn't need to be repeated multiple times in your ROM. When the 6502 'calls' a subroutine, it keeps a track of where it is calling FROM, so that when the subroutine returns, it knows where to continue code execution. This 'return address' is placed on the 6502's 'stack', which we will learn about very soon now. The stack is really just a bit of our precious RAM where the 6502 stores these addresses, and sometimes other values. The 6502 uses as much of our RAM for its stack as it needs, and each subroutine call we make requires 2 bytes (the return address) which are freed (no longer used) when the subroutine returns. If we 'nest' our subroutines, by calling one subroutine from within another, then each nested level requires an additional 2 bytes of stack space, and our stack 'grows' and starts taking increasing amounts of our RAM! So subroutines, though convenient, can also be costly. They also take a fair number of cycles for the 6502 to do all that stack manipulation—in fact it takes 6 cycles for the subroutine call (the 'jsr') and another 6 for the subroutine return (the 'rts'). So it's not often inside a kernel that we will see subroutine usage!
As noted, the 6502 maintains its stack in our RAM area. It has a register called the 'stack pointer' which gives it the address of the next available byte in RAM for it to use. As the 6502 fills up the stack, it decrements this pointer (thus, the stack 'grows' downwards in RAM). As the 6502 releases values from the stack, it increments this pointer. Generally we don't play with the stack pointer, but in case you're wondering, it can be set to any value only by transferring that value from the X register via the 'txs' instruction. If you've been following closely, you have noticed I added a bit to the initialization section!
ldx #$FF txs ; initialize stack pointer
Without that initialization, the stack pointer could point to anywhere in RAM (or even to TIA registers) and when we called a subroutine, the 6502 would attempt to store its return address to wherever the stack pointer was pointing. Probably with disastrous consequences!
Here's the sample kernel:
Positioning sprites is a complex task. This session we've started to explore the problem, and have some working code which does manage to roughly position the sprite at any given horizontal position we ask. Next session we're going to dig into much more robust horizontal positioning code, and learn how the TIA provides us that fine control we need to get the horizontal positioning code precise enough to allow TIA-pixel-precise positioning. Once we've achieved that, we can pretty much forget about how this works forever more, and use the horizontal positioning code as a black box. Or perhaps a woodgrain box might be more appropriate :)
See you next time!
Other Assembly Language Tutorials
Session 22: Sprites, Horizontal Positioning (Part 1)
This book was written in English, not computerese. It's written for Atari users, not for professional programmers (though they might find it useful).
This book only assumes a working knowledge of BASIC. It was designed to speak directly to the amateur programmer, the part-time computerist. It should help you make the transition from BASIC to machine language with relative ease.
The 6502 Instruction Set broken down into 6 groups.
Nice, simple instruction set in little boxes (not made out of ticky-tacky).
This book shows how to put together a large machine language program. All of the fundamentals were covered in Machine Language for Beginners. What remains is to put the rules to use by constructing a working program, to take the theory into the field and show how machine language is done.
An easy-to-read page from The Second Book Of Machine Language.
A useful page from Assembly Language Programming for the Atari Computers.
Continually strives to remain the largest and most complete source for 6502-related information in the world.
By John Pickens. Updated by Bruce Clark.
Below are direct links to the most important pages.
Goes over each of the internal registers and their use.
Gives a summary of whole instruction set.
Describes each of the 6502 memory addressing modes.
Describes the complete instruction set in detail.
Cycle counting is an important aspect of Atari 2600 programming. It makes possible the positioning of sprites, the drawing of six-digit scores, non-mirrored playfield graphics and many other cool TIA tricks that keep every game from looking like Combat.
Atari 2600 programming is different from any other kind of programming in many ways. Just one of these ways is the flow of the program.
The "bankswitching bible." Also check out the Atari 2600 Fun Facts and Information Guide and this post about bankswitching by SeaGtGruff at AtariAge.
Atari 2600 programming specs (HTML version).
Links to useful information, tools, source code, and documentation.
Atari 2600 programming site based on Garon's "The Dig," which is now dead.
Includes interactive color charts, an NTSC/PAL color conversion tool, and Atari 2600 color compatibility tools that can help you quickly find colors that go great together.
Adapted information and charts related to Atari 2600 music and sound.
A guide and a check list for finished carts.
A multi-platform Atari 2600 VCS emulator. It has a built-in debugger to help you with your works in progress or you can use it to study classic games.
A very good emulator that can also be embedded on your own web site so people can play the games you make online. It's much better than JStella.
If assembly language seems a little too hard, don't worry. You can always try to make Atari 2600 games the faster, easier way with batari Basic.
View this page and any external web sites at your own risk. I am not responsible for any possible spiritual, emotional, physical, financial or any other damage to you, your friends, family, ancestors, or descendants in the past, present, or future, living or dead, in this dimension or any other.
Use any example programs at your own risk. I am not responsible if they blow up your computer or melt your Atari 2600. Use assembly language at your own risk. I am not responsible if assembly language makes you cry or gives you brain damage.