Atari 2600 Programming Tutorial
Session 24: Some Nice Code
By Andrew Davie (adapted by Duane Alan Hahn)
In tutorial 22, we learned that to horizontally position a sprite, we need to trigger the RESPx register at the appropriate position in the scanline, at which point the sprite will display immediately. To move to an arbitrary horizontal position, we need to trigger RESPx just before the TIA is displaying the appropriate color clock. Our solution has been to use the desired X-position of the sprite as the basis for a delay loop which starts at the beginning of a scanline, delays until roughly the correct position, adjusts the HMPx fine-tune horizontal position register and then 'hits' RESPx to immediately position the sprite.
Since the minimal time for a single loop iteration is 5 cycles (involving a register decrement, and a branch), and 5 cycles corresponds to 15 TIA color-clocks, it follows that our delay-loop approach can only position RESPx writes with an accuracy of 15 TIA color-clocks. This is fine, though, as the hardware capability of fine-positioning sprites by -8 to +7 pixels perfectly allows the correct position of the sprite to be established.
The approach taken previously has been to effectively divide the position by 15 (either through a table-lookup, or 'clever' code which simulated a divide by 15 using a divide by 16 (quick) + adjustment) and use that value as the iteration counter in a delay loop. This approach works, and has been fairly standard for a number of years. This is the approach presented in our earlier tutorial.
A posting to the [stella] list of an independent discovery of a 'new' method much improves on this technique. In actual fact, the technique was already known and documented in the list . . . but for various reasons these things don't always become well-known. The 'new' technique of horizontal positioning rolls the divide-by-15 and the delay loop into a single entity.
.Div15 sbc #15 ; 2
bcs .Div15 ; 3(2)
Now that may not look like much, but it's absolutely brilliant! Every iteration through the loop, the accumulator is decremented by 15. When the subtraction results in a carry, the accumulator has gone 'past' 0, and our loop ends. Each iteration takes exactly 5 cycles (with an extra 2 cycles added for the initial 'sec' and one less for the final branch not taken). The real beauty of the code is that we also, 'for free', get the correct -8 to +7 adjustment for the fine-tuning of the position (which with a little bit of fine-tuning can be used for the HMP0 register)! Read the relevant post on [stella] here. . .
For this brilliant bit of coding, our thanks go to R. Mundschau. . .
Conceptual Negative Index
One interesting aspect of this code is the access to the table with a (conceptual) negative index (-1 to -15 inclusive). Negative numbers are represented in two's complement form, so -1 is %11111111 which is *exactly* the same as 255 (%11111111). So how can we use negative numbers as indexes? We can't! All indexing is considered to be with positive numbers. So if our index was -1, we would actually index 255 bytes past the beginning of our table. The neat bit of code at the bottom sets the conceptual start of our table to 241 bytes BEFORE the start of the actual data so that when we attempt to access the -15th element of the table, we ACTUALLY end up at the very first byte of the 'fineAdjustBegin' table. Likewise, when accessing the -1th element, we ACTUALLY access the last element of the table. It's all very neat!
Finally, since we need to account for every cycle in this code very carefully (as the horizontal position depends on exactly where we write the RESP0 value), we need to take into account the possibility that an extra cycle is being thrown in when we access fineAdjustTable,y and that access crosses a page boundary. By positioning the table being accessed exactly on a page boundary, the code guarantees that every access incurs an extra cycle 'penalty' and is therefore consistent for all cases.
I don't take any credit for this, I just admire it. I consider this a BRILLIANT bit of coding, so hats-off to R. Mundschau and thanks for sharing!
8-Byte System Clear
Another 'BRILLIANT' bit of code, but this time from yours truly, is the 8-byte system clear. We touched on this earlier in Session 12, but I thought I'd give a quick run-down on exactly how that code works. . .
We assume that when this code starts, the system is in a totally unknown state. Firstly, X and A are set to 0, and we enter the loop.
The loop begins: X-register is decremented (to 255) and this value is placed in the stack pointer (now $FF) the accumulator(0) is then pushed onto the stack, so memory/hardware location $FF is set to 0, and the stack pointer decrements to $FE since the txs and pha don't affect the flags, the branch will be based on the decrement of the x register if non-zero, then we repeat the loop. 0 will be written to 256 consecutive memory locations starting with $FF and ending with 0 (inclusive). Loop will terminate after 256 iterations. On the final pass through, x would be decremented to 0, and this placed in the stack pointer. We then push the accumulator (0) onto the stack (which effectively writes it to memory (TIA) location 0) and as a consequence the stack pointer decrements (and wraps!) back to $FF.
At the conclusion of the above, X = 0, A = 0, SP = $FF, a near-perfect init!
That could be the best 8-bytes ever written.
[This was the last session that Andrew Davie had time to work on. Maybe he'll find the time to finish these sessions in the future.]