NES Programming Guide v 1.0
The purpose of this document is to make less difficult the programming
of the first demo. There are much better docs with more detailed
information, and I hope after reading this one those will be easier
to understand.
If something seems difficult (especially at the beginning), try to
continue reading, it can be explained later.
Please report parts not clear or with errors.
Contents:
Binary and hexadecimal
6502 instruction set
Getting started
Binary and hexadecimal
Information comes from the existent differences between 2 or more
things or groups. The simplest diffrence is 1 - 0, true - false,
on - off, low current - high current, etc. Computers work with this
simple information.
With transistors can be constructed "gates" which receive 2 values
(each can be 0 or 1) and produce a logical output (0 or 1 too).
These are the most important operations (you'd remember this from
math class):
AND
Only if both are true the result is true.
+----------------------------+
| Input A | Input B | Result |
|---------|---------|--------|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
+----------------------------+
OR
If any is true the result is true.
+----------------------------+
| Input A | Input B | Result |
|---------|---------|--------|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |
+----------------------------+
XOR (exclusive OR)
If both are the same the result is false.
+----------------------------+
| Input A | Input B | Result |
|---------|---------|--------|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
+----------------------------+
and the simplest, with only one input:
NEGATION
+----------------+
| Input | Result |
|-------|--------|
| 0 | 1 |
| 1 | 0 |
+----------------+
With these gates one can make circuits which can do simple additions,
substractions and transferences of 0s and 1s. Combining millions of
circuits and gates we have a microprocessor.
So the basic value managed by any computer is 1 or 0. This 1-0 based
numeration system is called 'binary'. As the decimal system uses only 10
'characters' (numerals) to write all numbers from -almost_infinite to
+almost_infinite, in the binary system we can write any number too, using
only 0s and 1s. Look at this example:
Decimal Binary
0 0 (Begin count with 0)
1 1 (next is 1)
2 10 (binary lacks a symbol for '2', so there is a carry
to next position)
3 11
4 100 (another carry)
5 101
6 110
7 111
8 1000
9 1001
10 1010
11 1011
12 1100
Look, the last digit of a decimal number cycles from 0 to 9, then does
a carry to the neighbor position, and returns to 0. Binary is the same,
cycles from 0 to 1, does a carry, and returns to 0 (though they are not
'digits', but 'bits'). This way is coded everything managed by computers:
images, sounds, text, etc. All is 0s-1s. All is bits.
Binary is just the simplest way to write numbers, but remember all
math laws are still valid. Additions, multiplications, substractions, etc.
are the same. If you don't believe it, check this:
101 (5, look above) 1+0, we write 1 (as in decimal)
+ 110 (6) 1+1, we carry 1 and write 0 (as would be
------ 9+1 in decimal)
1011 (11)
11 (3)
X 10 (2)
------ As in decimal,
00 0 multiplied by anything gives 0
11 and 1 gives the same number
------
110 (6)
But a single value of 1 or 0 is of little use, we need a group of
bits to be able to write greater numbers. The default unit used is the
'byte', a group of 8 bits: 11111111. The minimum data unit which can be
stored in memory is a byte (would be a waste to address individual bits).
Since a byte has 8 bits, the values it can hold range from 0 to 255.
The other common unit is the 'word', made up of 2 bytes (16 bits) and a
range of 0 to 65535. When you work with a machine which can address words
directly (like the SNES), remember in that case the 2 bytes are reversed
in memory. In this way, if you do a 1 byte operation at the address, you
will get the lower byte, not the upper one (which would lead to errors).
To write negative numbers in paper, we just precede them with a -.
but inside a computer another form must be used. To convert a number to
its negative, set the clear bits and clear the set ones, and add 1. Using
this method, we can add negative and positive numbers and get the right
answer:
00001010 (10)
11110101 (reversed)
11110110 (added 1, this is -10)
Look this addition (or substraction?):
11110110 (-10)
+ 00000100 ( 4)
-------------
11111010 ( -6)
To check it is -6, reverse the bits and add 1
11111010 ----> 00000101 -----> 00000110 (6)
To avoid confusion between 11110110 (-10) and 11110110 (246), the last
bit is reserved for negative numbers, which changes the range of a byte to
{-127 to +127}, and a word to {-32267 to +32267}. Since the maximum positive
number was cut by half, usually we'll use just natural numbers, without
negatives. After all, there are not negative scores in games.
As you see, manage of large rows of 0s and 1s can be quite confusing,
and the solution to it is the use of hexadecimal. In this system the symbols
used have values from 0 to 15 (decimal is 0 to 9), using the first letter
of the alphabet for 10,11,12,13,14,15 (A,B,C,D,E,F). The main advantage of
hexadecimal is that is allows you to know the position of bits in a byte:
4C (78)
01001100 (4, C=12=4+8) Sometimes it is useful to know this.
Also, with hex, addresses and sizes in a computer are more intelligible.
For example, size of tiles in NES and GB is 10 (16 in decimal), the number
of sprites in the NES is 100 (256), colors in the SNES are 8000 (32267).
We'll denote hex numbers with h (804Ch) and binary with b (10011100b).
If no letter is present, we assume decimal, but if you want to specify,
add d (1999d). Often hexadecimal is denoted with $ ($20A2) and binary
with % (%10101111), but I think the use of letters is more clear and easy.
6502 Instruction Set
The NES uses a 6502 as CPU. It has 5 8-bit registers and 1 16-bit register
(do not confuse with the registers in memory for joypads and
graphics).
The 16-bit register is the Program Counter (hereafter PC). It has the
memory address of the next instruction to be executed. In other words, it
points to the next instruction. Since it is only 16 bits long (2 bytes),
its maximum addressing range is 64kb, so the longest program the 6502
can run must be below this size.
The Stack register (hereafter S) holds the address of (aka points to) the
top of the stack. This is a memory zone from 100h to 1FFh where data can be
quickly stored and retrieved. Every time data is "pushed" in the stack,
it is stored in the address to where S is pointing, and S is decreased by 1.
When data is "popped" or "pulled", the data is read and S increased. So,
the first data in enter the stack is the last to come out.
The Processor Status or Flag Register (P) is used to indicate the status of
the last instruction executed. Every bit of this register is a flag that
marks a certain condition. When the bit is 1, we say that flag is set; when
it is 0, the flag is clear. Instructions affect only certain flags. To know
which in detail, consult a 6502 instruction set summary, or a book.
Numbered from bit position 7 to 0, they are:
Negative(N): It is set when the bit 7 of the result of the last
instruction is set (is 1). In integer operations, negative numbers
have this bit set and positive clear, hence the name.
Overflow(V): It is set when a carry occurs from bit 6 to 7. In other
words, when 01xxxxxxb becomes 1xxxxxxxb. For example, when add 1 to
01111111b (127), the result is 10000000b (-127 in integers, 128 in
natural numbers), so this flag is useful for detect this error.
Nevertheless, in games the negative numbers are seldom used, so you
don't need to worry too much about this.
Break(B): Set when the BRK instruction has been executed.
Decimal(D): Set when the microprocessor is in Decimal Mode. This is a
mode where the bytes are cutted in 2 parts, one for a decimal digit,
making arithmethic with decimal points more exact, but the maximum
value for a byte then becomes 100 instead of 255, so it is never used
in games.
Interrupt(I): This flag allows interrupts when is clear. An interrupt
is a incoming signal from a peripheral that usually indicates to the
processor to read data from the peripheral. Set this flag to ignore
that signal. This is not valid for Non Maskable Interupts (NMI), they
can't be blocked this way.
Zero(Z): It is set when the result of last operation is zero.
Carry(C): It is set when a carry occurs. In other words, when an add
operation surpasses 255. For example (now these are natural numbers,
as usual): add 1 to 255 (11111111b); the result is 0. This flag can
then be used to do additions in 2 bytes, making the range of values
wider.
Another use of this flag is in rotating and shifting operations. The
bit that is driven out of the byte is stored here.
The accumulator(A) is used for arithmethical and logical operations.
The index register(X) is used for indexed addressing. It means that its value
will be added to the other address given. This is useful for blocks
of data: you do a loop which increases (or decreases) X to have
access to all data, instead of using an instruction for every byte.
Index 2(Y): Used for some limited indexed addressing.
This three registers (A, X and Y) are the "work registers" that can be used
to store data. The others are changed automatically (in most cases).
Now the instructions:
ADC Add with carry. Adds A plus another number, plus the Carry flag, and
stores the result in A. The other number (source) can be a constant
(ADC #45), an address (ADC 100h) or an indexed address (ADC 100h,X).
Since C is added too, you must clear it before with CLC.
AND Logical AND. Does bitwise AND with the accumulator and a source (memory
or inmediate). Logical AND means that result will be true only if the
two values are true. In other words, the resultant bit will be 1 only
if both bits are 1; otherwise it will be 0. For example:
lda #10101010b
and #00001111b
stores 00001010b in A. As you can see, a "mask" can be used to
force zeros and let remain what you want.This is useful when using
a single byte for several purposes, so when you need only certain
bits you can clear the others.
ASL Arithmethic Shift Left. Shifts bits to left in A or memory. All bits will
be moved one position to the left. The rightmost position will be
filled with a zero, and the leftmost will be stored on the Carry Flag.
Example:
lda #10011001b
asl A
gives: 00110010b, and C will be set. One of its uses is quick
multiplication by 2 (as shifting decimal numbers to left multiplies
by 10).
BCC Branch if Carry Clear. Checks the Carry Flag and, if it is set, branches
to a certain point; otherwise continues down. Branches are used to
make decisions: if certain flag is set (or clear), the program will
continue in another point; if not, the program continues as if
nothing happened. The point where the program (more exactly, the
Program Counter) will jump is a number within a range of -127 to +127
bytes of the actual position. Usually you will use a label to mark
the place where it will jump. An example is some lines below.
BCS Branch if Carry Set. The same, but when C is set.
BEQ Branch if Equal. Branches when the Zero Flag is set. In other words, when
the operand is zero. It's called "Equal" because, when comparing, Z
is set if both terms are equal. Example:
lda bullet_column ;(monster is hit when position of
cmp monster_column ; bullet is the same as him)
beq monster_dies
(here code for monster moves)
monster_dies
(die sequence: it disappears and the score increases)
Here, monster_column and bullet_column are variables, stored
somewhere in memory. But instead of using the address number, we can
tell first the assembler we will use labels for them. monster_dies is
a label for a position where the program will continue.
In fact, the most common use of BEQ is checking if something is zero.
For example:
lda hero_lives
beq gameover
(here code for continue playing)
gameover
(continue? insert coins)
It's useful with AND to check status of individual bits.
This is an example where we suppose the status of button B is in
bit 1, in a variable called "joypad1":
lda joypad1
and #00000010b ;this blows out everything, except B button
beq B_not_pressed
(code for fire ball)
B_not_pressed (more code)
BIT Bit test. Is like AND, but without storing the result in A.
BMI Branch if Minus. Branches when N is set. This means, when the bit 7 is
set. Is called "minus" because comparisons (which are actually
substractions) give a negative number when the second term is greater
than the first. Negative numbers have the bit 7 set.
It is useful too to check quickly bit 7.
BNE Branch if Not Equal. Branches when Z is clear (the operand is not 0).
The contrary of BEQ.
BPL Branch if Plus. Branches when the Negative Flag is clear, that is, when
the bit 7 of the operand is clear. One of its most common uses is
to wait for Vblank:
wait_for_vblank
lda 2002h ;bit 7 at 2002h is automatically set to 1
;when Vblank occurs
bpl wait_for_vblank
(code for graphics here)
BRK Break. This makes an interruption. Somewhere says that this instruction
returns one byte after the correct one. Anyway it is rarely used.
BVC Branch if Overflow Clear. Not very useful.
BVS Branch if Overflow Set. Same.
CLC Clear Carry flag. C becomes 0. You must use it before ADC in most cases.
Used too before rotations to prevent errors.
CLD Clear Decimal mode. The processor is supposed to begin in Binary mode,
and you will never use Decimal mode, so you can forget this one.
Nevertheless I have seen is common to use it at the beginning.
CLI Clear Interrupt disable. Allows interrupts (NMI not included).
Seldom used.
CLV Clear Overflow flag. Seldom used.
CMP Compare. Compares A with source, for later use of BEQ, BNE, BMI or BPL.
The comparison is a substraction that does not give result, but
affects the flags: if both terms are equal, the result is 0; if the
first is greater, the result is positive; if the second is greater,
the result is negative.
CPX Compare X. Compares X with source.
CPY Compare Y. Same.
DEC Decrement source. A memory address is decreased 1.
DEX Decrement X. Commonly used in loops:
ldx 5 ;loop 5 times
loop5 (code)
dex
bne loop5
One useful loop is addressing a block of memory (where a table with
data can be stored), or clearing it:
ldx 50h ;the block is 50h bytes
lda #0 ;all bytes will be 0
memclear sta block_address,X
dex
bne memclear
DEY Decrement Y. Same.
EOR Exclusive OR. Bitwise exclusive OR with A and source.
This gives 1 only is the two bits are different; if not,
the result is 0. Example:
lda #11001100b
eor #11110000b
stores 00111100b in A.
INC Increment source. A memory address is increased 1.
INX Increment X. Especially useful for loops with graphics. Since the
address at PPU is automatically incremented, we can't use DEX,
or the data we put there will be inverted:
lda #20h ;address of name table
sta 2006h
lda #0Ch
sta 2006h
ldx #0
loopgfx lda mem_block,X
sta 2007h
inx
cpx #20h ;we are coying 20h bytes
bne loopgfx
As you can see, the use of INX involves a CPX instruction, so
loops use usually DEX because it's faster.
INY Increment Y. Same.
JMP Jump. The program jumps to a label and continues from there.
JSR Jump to Subroutine. Is a jump too, but the PC is saved in the stack
before, so when the code of the subroutine ends, with RTS you return
where the subroutine was "called". This makes possible to call the
same subroutine from many points in the program. But often the
subroutine will need parameters for its actions. They can be passed
in the registers, memory, or manipulating the stack.
Look this example, which uses the registers (the easiest way):
ldx #5 ;X will be the enemies number (5 enemies)
loop_enemies
jsr enemy_actions
dex
bne loop_enemies
(rest of program)
enemy_actions
(here code to approach hero, throw knifes, etc)
rts
With this you can control 5 enemies with the same code.
LDA Load Accumulator. Stores in A the value of a constant or a memory address.
Can be used this way:
lda #7 This stores in A the value 7
lda 7 This stores in A the value in the memory address 7
lda 7,X This stores in A the value in the memory address plus
X. This means that if X is 0, A will have the value in 7; if X is 5,
A will have the value of the memory address 12 (0Ch).
LDX Load X. The same, but obviously indexed addresing can't be used.
LDY Load Y. Same.
LSR Logical Shift Right. Shifts to right A or a memory address. The leftmost
bit is filled with 0 and the rightmost goes to C. Example:
lda #01100110b
lsr A
stores 00110011b in A, and the Carry Flag will be clear.
One of its uses is for quick divisions by 2.
NOP No Operation. This does nothing. Can be used for delaying loops
(sometimes we need less speed), but in such cases is preferable
to use a slower operation, as this is one of the fastests.
ORA Logical Inclusive OR. Bitwise OR with A and a source. The result will be
false only if both bits are 0; otherwise it will be 1. Example:
lda #11001100b
ora #10101010b
stores 11101110b in A. This can be used to force bits to 1
(as AND is used to force 0). For example, suppose we have the
variable hero_status to indicate if he is jumping with the bit 5,
and if he is dead with the bit 2:
(code to decide when he'll die)
lda hero_status
ora #00000100b ;set die bit
sta hero_status
Now the jump:
(code to know if button A is pressed)
lda hero_status
ora #00100000b ;we force bit 5 to 1, without
sta hero_status ;touching the other bits
(code to jump)
Now, suppose he has finished his jump:
(code to know when finish jumping)
lda hero_status
and #11011111b ;force bit 5 to 0, without
sta hero_status ;touching other bits
As you can see, with ORA, 0 lefts bits unchanged and 1 forces 1; with
AND, 1 lefts bits unchanged and 0 forces 0.
PHA Push accumulator. Stores A in the stack.
PHP Push the Processor Status Register. Stores P (aka flags) in the stack.
PLA Pull Accumulator. Stores in A the value in the top of the stack.
PLP Pull Processor Status Register. Stores in the flags the value in the
top of the stack.
ROL Rotate Left. Rotates the bits in A or source one position to the left.
The rightmost bit is filled with the value of the Carry Flag, and
the left most bit is sent to the Carry Flag. Example:
clc ;C will be 0
lda #10010111b
rol A
(A is now 00101110b, and C is set to 1)
rol A
(A is now 01011101b, and C is 0)
rol A
(A is now 10111010b and C is 0)
rol A
(A is now 01110100b and C is 1)
This is useful to fill a byte when a flow of one bit cames from
somehwhere. Especifically, from the joypads.
RTI Return from Interrupt. After the interrupt code is executed, this
returns to where the program was left restoring the PC and the
flags that were pushed on the stack when the interrupt began.
RTS Return from Subroutine. This returns from a subroutine to the next
instruction to where it was called, pulling PC from the stack.
SBC Substract with Carry. Stores in A the result of substracting source
from A. In a book there was written that the value of C is
irrelevant here, but anyway is preferable clear it before.
SEC Set Carry flag. Sets C to 1.
SED Set Decimal mode. You won't need this.
SEI Set Interrupt disable. When the Interrup t disable flag is set, the
processor ignores interrupt signals. NMI are not included, they
will continue working. As far as I know, the only interrupt in
the NES is one that executes every Vblank, but this is a Non
Maskable Interrupt (NMI), which can't be affected by this flag.
It is enabled or disabled with a register in 2000h.
Nevertheless seems is common to use this at the beginning.
STA Store Accumulator. Stores A in memory. Indexed addressing can be used.
STX Store X. Stores X in memory.
STY Store Y. Same.
TAX Transfer A to X. Stores A in X. Store instructions don't allows transfer
of data between registers, so the existence of the Transfer ones.
TAY Transfer A to Y.
TSX Transfer S to X. Stores in X the value of the Stack pointer, to know
where it is.
TXA Transfer X to A.
TXS Transfer X to S. Stores in S the value of X. This must be used always
at the beginning of a program to use the stack, because it doesn't
begin to work automatically. If you forget this, you won't be able
to use subroutines nor interrupts. Do this way:
ldx #0FFh
txs
This instruction can be used too to manipulate the stack, but that
is quite complex.
TYX Transfer Y to A.
Getting Started
In this section there are pieces of code you can use to make your own
demo rom. Just download TASM from
http://www.galeon.com/waseiyakusha/indexe.html
and paste the code. Remember this is only to see something working, but
you must add what will be the "essence" of your demo (or game?).
First, we assign a label to some memory addresses. This will make the
code more readable (I use a "V" to distinguish them to other labels):
Vcontrol1 =0
Vcolumn =1
Vline =2
Then begin the code at C000h:
.org 0C000h
begin ldx #0FFh ;initialize stack
txs
vblank lda 2002h ;wait for Vblank
bpl vblank
lda #00000000b ;control PPU 1 (consult other docs for details)
sta 2000h
lda #10011110b ;control PPU 2
sta 2001h
lda #0 ;no scroll
sta 2005h
sta 2005h
;
lda #3Fh ;set the background palette
sta 2006h ;(if you skip this, all will be gray -color 0-)
lda #0h
sta 2006h
lda #0Fh ;colors (black)
sta 2007h ;use Chris Covell's RGB to see values
lda #1 ;(blue)
sta 2007h
lda #16h ;(red)
sta 2007h
lda #28h ;(yellow)
sta 2007h
;
lda #3Fh ;sprites palette
sta 2006h
lda #10h
sta 2006h
lda #0Fh ;colors (this is the transparent color, use the
sta 2007h ;same as background, black in this case)
lda #30h ;(white)
sta 2007h
lda #37h ;(skin)
sta 2007h
lda #7 ;(brown)
sta 2007h
;---------------- background
lda #20h ;this fills the name table with tile numbers
sta 2006h ;it's only copying the table at the end
sta 2006h
ldx #0
loop1 lda 0E000h,X
sta 2007h
inx
cpx #0E0h
bne loop1
ldx #0
loop2 lda 0E0E0h,X
sta 2007h
inx
cpx #0E0h
bne loop2
ldx #0
loop3 lda 0E1C0h,X
sta 2007h
inx
cpx #0E0h
bne loop3
ldx #0
loop4 lda 0E2A0h,X
sta 2007h
inx
cpx #0E0h
bne loop4
;-----------------
lda #10 ;set initial values for line and column
sta Vcolumn ;of the sprite
sta Vline
;------------
mainloop
jsr readcontrol1
jsr delay ;we want some delay to be able to see it
jsr delay
jsr delay
jsr delay
;---(add your code here)
lda Vcontrol1 ;if no button is pressed, just go ahead
beq nobutton
and #00000001b ;if it is right, increase column number
beq noright ; to move to right
inc Vcolumn
jmp nobutton
noright lda Vcontrol1
and #00000010b ;if it's left, to the left
beq noleft
dec Vcolumn
jmp nobutton
noleft ;(continue the same, for up, down, etc)
;Of course, the following code works too:
; lda Vcontrol1
; cmp #(buttonnumber)
; bne nothisbutton
;There is only a little diference between this and the other one.
nobutton jsr sprites
jmp mainloop
;----------------------------------------------------------------------------
; SUBROUTINES
;----------------------------------------------------------------------------
;------------------------------------------ read control 1
readcontrol1 lda #1 ;reset control 1
sta 4016h
lda #0
sta 4016h
sta Vcontrol1
ldx #8 ;read control 1
control1 lda 4016h ;this loop reads bit by bit
and #1 ;and rotates to fill the byte
ora Vcontrol1
asl A
sta Vcontrol1
dex
bne control1
bcc noabutton ;then checks if, as a result of rotation, the
lsr A ;first bit read (button A) was driven to C
ora #10000000b
bne endreading
noabutton lsr A
endreading sta Vcontrol1
rts
;(Vcontrol1 is filled as follows: (A,B,Select,Start,Up,Down,Left,Right)
;------------------------------------------ delay a while
delay pha ;first save in stack A and X
txa
pha
ldx #0FFh
delayloop lda 8000h,X ;then loop doing a slow operation
dex
bne delayloop
pla ;and restore A and X
tax
pla
rts
;------------------------------------------ Sprites
sprites lda 2002h ;wait for vblank
bpl sprites
lda #0
sta 2003h
;-------
lda Vline
sta 2004h
lda #1 ;use tile #1
sta 2004h
lda #0 ;attributes in their simplest form
sta 2004h
lda Vcolumn
sta 2004h
rts
;----------------------------------------------- background
.org 0E000h ;of course, you can use another place in memory
.db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0
.db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0
.db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1
.db 1,1,1,1,0,0,0,5,5,5,5,5,0,0,4,0
.db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0
.db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0
; and continue other 25 rows, 32 values each (but cut them in 2 lines,
;or the assembler will not work). These are the
;tile numbers of the background.
;--------------------------------------------------------------------
.org 0FF00h ;interruption NMI (useless, but just in case)
rti
.org 0FFFAh ;interruption vectors
.db 0,0FFh,0,0C0h,0,0C0h
.end
Assemble it (tasm -65 -b -f00 filename.asm), adn rename the resulting .obj
file to .prg.
Then, make a chr rom with this simple code (in a new file), and draw it
with a tile editor (name it filename.chr):
.org 0
.db 0FFh
.org 2000h
.end
Make a header too (filename.hdr):
.db 4eh,45h,53h,1ah,1,1,0,0,0,0,0,0,0,0,0,0
.end
Join them with tniNES (tnines -j filename), and you're done!
Thanks to:
YOSHi (needless to say).
Tony Young With his page and demo I began to understand YOSHi's document
(http://members.aol.com/TYoung79/nesprog.html)
Jonathan Bowen for his summary of the 6502 instruction set
Questions/suggestions: bokudono@netscape.net
Waseiyakusha (http://www.galeon.com/waseiyakusha/indexe.html)