Data structures

So far, we've been treating data as a bunch of one-byte values. There really isn't a lot you can do just with bytes. This section talks about how to deal with larger and smaller elements.

Arrays

An array is a bunch of data elements in a row. An array of bytes is very easy to handle with the 6502 chip, because the various indexed addressing modes handle it for you. Just load the index into the X or Y register and do an absolute indexed load. In general, these are going to be zero-indexed (that is, a 32-byte array is indexed from 0 to 31.) This code would initialize a byte array with 32 entries to 0:

   lda #$00
   tax
loop:
   sta array,x
   inx
   cpx #$20
   bne loop

(If you count down to save instructions, remember to adjust the base address so that it's still writing the same memory location.)

This approach to arrays has some limits. Primary among them is that we can't have arrays of size larger than 256; we can't fit our index into the index register. In order to address larger arrays, we need to use the indirect indexed addressing mode. We use 16-bit addition to add the offset to the base pointer, then set the Y register to 0 and then load the value with lda (ptr),y.

Well, actually, we can do better than that. Suppose we want to clear out 8K of ram, from $2000 to $4000. We can use the Y register to hold the low byte of our offset, and only update the high bit when necessary. That produces the following loop:

   lda #$00  ; Set pointer value to base ($2000)
   sta ptr
   lda #$20
   sta ptr+1
   lda #$00  ; Storing a zero
   ldx #$20  ; 8,192 ($2000) iterations: high byte
   ldy #$00  ; low byte.
loop:
   sta (ptr),y
   iny
   bne loop  ; If we haven't wrapped around, go back
   inc ptr+1 ; Otherwise update high byte
   dex       ; bump counter
   bne loop  ; and continue if we aren't done

This code could be optimized further; the loop prelude in particular loads a lot of redundant values that could be compressed down further:

   lda #$00
   tay
   ldx #$20
   sta ptr
   stx ptr+1

That's not directly relevant to arrays, but these sorts of things are good things to keep in mind when writing your code. Done well, they can make it much smaller and faster; done carelessly, they can force a lot of bizarre dependencies on your code and make it impossible to modify later.

Records

A record is a collection of values all referred to as one variable. This has no immediate representation in assembler. If you have a global variable that's two bytes and a code pointer, this is exactly equivalent to three seperate variables. You can just put one label in front of it, and refer to the first byte as label, the second as label+1, and the code pointer a label+2.

This really applies to all data structures that take up more than one byte. When dealing with the pointer, a 16-bit value, we refer to the low byte as ptr (or label+2, in the example above), and the high byte as ptr+1 (or label+3).

Arrays of records are more interesting. There are two possibilities for these. The way most high level languages treat it is by keeping the records contiguous. If you have an array of two sixteen bit integers, then the records are stored in order, one at a time. The first is in location $1000, the next in $1004, the next in $1008, and so on. You can do this with the 6502, but you'll probably have to use the indirect indexed mode if you want to be able to iterate conveniently.

Another, more unusual, but more efficient approach is to keep each byte as a seperate array, just like in the arrays example above. To illustrate, here's a little bit of code to go through a contiguous array of 16 bit integers, adding their values to some total variable:

   ldx #$10  ; Number of elements in the array
   ldy #$00  ; Byte index from array start
loop:
   clc
   lda array, y      ; Low byte
   adc total
   sta total
   lda array+1, y    ; High byte
   adc total+1
   sta total+1
   iny               ; Jump ahead to next entry
   iny
   dex               ; Check for loop termination
   bne loop

And here's the same loop, keeping the high and low bytes in seperate arrays:

   ldx #$00
loop:
   clc
   lda lowbyte,x
   adc total
   sta total
   lda highbyte,x
   adc total+1
   sta total+1
   inx
   cpx #$10
   bne loop

Which approach is the right one depends on what you're doing. For large arrays, the first approach is better, as you only need to maintain one base pointer. For smaller arrays, the easier indexing makes the second approach more convenient.

Bitfields

To store values that are smaller than a byte, you can save space by putting multiple values in a byte. To extract a sub-byte value, use the bitmasking commands: