NES Programming Guide v 1.0 The purpose of this document is to make less difficult the programming of the first demo. There are much better docs with more detailed information, and I hope after reading this one those will be easier to understand. If something seems difficult (especially at the beginning), try to continue reading, it can be explained later. Please report parts not clear or with errors. Contents: Binary and hexadecimal 6502 instruction set Getting started Binary and hexadecimal Information comes from the existent differences between 2 or more things or groups. The simplest diffrence is 1 - 0, true - false, on - off, low current - high current, etc. Computers work with this simple information. With transistors can be constructed "gates" which receive 2 values (each can be 0 or 1) and produce a logical output (0 or 1 too). These are the most important operations (you'd remember this from math class): AND Only if both are true the result is true. +----------------------------+ | Input A | Input B | Result | |---------|---------|--------| | 0 | 0 | 0 | | 0 | 1 | 0 | | 1 | 0 | 0 | | 1 | 1 | 1 | +----------------------------+ OR If any is true the result is true. +----------------------------+ | Input A | Input B | Result | |---------|---------|--------| | 0 | 0 | 0 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 1 | +----------------------------+ XOR (exclusive OR) If both are the same the result is false. +----------------------------+ | Input A | Input B | Result | |---------|---------|--------| | 0 | 0 | 0 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 0 | +----------------------------+ and the simplest, with only one input: NEGATION +----------------+ | Input | Result | |-------|--------| | 0 | 1 | | 1 | 0 | +----------------+ With these gates one can make circuits which can do simple additions, substractions and transferences of 0s and 1s. Combining millions of circuits and gates we have a microprocessor. So the basic value managed by any computer is 1 or 0. This 1-0 based numeration system is called 'binary'. As the decimal system uses only 10 'characters' (numerals) to write all numbers from -almost_infinite to +almost_infinite, in the binary system we can write any number too, using only 0s and 1s. Look at this example: Decimal Binary 0 0 (Begin count with 0) 1 1 (next is 1) 2 10 (binary lacks a symbol for '2', so there is a carry to next position) 3 11 4 100 (another carry) 5 101 6 110 7 111 8 1000 9 1001 10 1010 11 1011 12 1100 Look, the last digit of a decimal number cycles from 0 to 9, then does a carry to the neighbor position, and returns to 0. Binary is the same, cycles from 0 to 1, does a carry, and returns to 0 (though they are not 'digits', but 'bits'). This way is coded everything managed by computers: images, sounds, text, etc. All is 0s-1s. All is bits. Binary is just the simplest way to write numbers, but remember all math laws are still valid. Additions, multiplications, substractions, etc. are the same. If you don't believe it, check this: 101 (5, look above) 1+0, we write 1 (as in decimal) + 110 (6) 1+1, we carry 1 and write 0 (as would be ------ 9+1 in decimal) 1011 (11) 11 (3) X 10 (2) ------ As in decimal, 00 0 multiplied by anything gives 0 11 and 1 gives the same number ------ 110 (6) But a single value of 1 or 0 is of little use, we need a group of bits to be able to write greater numbers. The default unit used is the 'byte', a group of 8 bits: 11111111. The minimum data unit which can be stored in memory is a byte (would be a waste to address individual bits). Since a byte has 8 bits, the values it can hold range from 0 to 255. The other common unit is the 'word', made up of 2 bytes (16 bits) and a range of 0 to 65535. When you work with a machine which can address words directly (like the SNES), remember in that case the 2 bytes are reversed in memory. In this way, if you do a 1 byte operation at the address, you will get the lower byte, not the upper one (which would lead to errors). To write negative numbers in paper, we just precede them with a -. but inside a computer another form must be used. To convert a number to its negative, set the clear bits and clear the set ones, and add 1. Using this method, we can add negative and positive numbers and get the right answer: 00001010 (10) 11110101 (reversed) 11110110 (added 1, this is -10) Look this addition (or substraction?): 11110110 (-10) + 00000100 ( 4) ------------- 11111010 ( -6) To check it is -6, reverse the bits and add 1 11111010 ----> 00000101 -----> 00000110 (6) To avoid confusion between 11110110 (-10) and 11110110 (246), the last bit is reserved for negative numbers, which changes the range of a byte to {-127 to +127}, and a word to {-32267 to +32267}. Since the maximum positive number was cut by half, usually we'll use just natural numbers, without negatives. After all, there are not negative scores in games. As you see, manage of large rows of 0s and 1s can be quite confusing, and the solution to it is the use of hexadecimal. In this system the symbols used have values from 0 to 15 (decimal is 0 to 9), using the first letter of the alphabet for 10,11,12,13,14,15 (A,B,C,D,E,F). The main advantage of hexadecimal is that is allows you to know the position of bits in a byte: 4C (78) 01001100 (4, C=12=4+8) Sometimes it is useful to know this. Also, with hex, addresses and sizes in a computer are more intelligible. For example, size of tiles in NES and GB is 10 (16 in decimal), the number of sprites in the NES is 100 (256), colors in the SNES are 8000 (32267). We'll denote hex numbers with h (804Ch) and binary with b (10011100b). If no letter is present, we assume decimal, but if you want to specify, add d (1999d). Often hexadecimal is denoted with $ ($20A2) and binary with % (%10101111), but I think the use of letters is more clear and easy. 6502 Instruction Set The NES uses a 6502 as CPU. It has 5 8-bit registers and 1 16-bit register (do not confuse with the registers in memory for joypads and graphics). The 16-bit register is the Program Counter (hereafter PC). It has the memory address of the next instruction to be executed. In other words, it points to the next instruction. Since it is only 16 bits long (2 bytes), its maximum addressing range is 64kb, so the longest program the 6502 can run must be below this size. The Stack register (hereafter S) holds the address of (aka points to) the top of the stack. This is a memory zone from 100h to 1FFh where data can be quickly stored and retrieved. Every time data is "pushed" in the stack, it is stored in the address to where S is pointing, and S is decreased by 1. When data is "popped" or "pulled", the data is read and S increased. So, the first data in enter the stack is the last to come out. The Processor Status or Flag Register (P) is used to indicate the status of the last instruction executed. Every bit of this register is a flag that marks a certain condition. When the bit is 1, we say that flag is set; when it is 0, the flag is clear. Instructions affect only certain flags. To know which in detail, consult a 6502 instruction set summary, or a book. Numbered from bit position 7 to 0, they are: Negative(N): It is set when the bit 7 of the result of the last instruction is set (is 1). In integer operations, negative numbers have this bit set and positive clear, hence the name. Overflow(V): It is set when a carry occurs from bit 6 to 7. In other words, when 01xxxxxxb becomes 1xxxxxxxb. For example, when add 1 to 01111111b (127), the result is 10000000b (-127 in integers, 128 in natural numbers), so this flag is useful for detect this error. Nevertheless, in games the negative numbers are seldom used, so you don't need to worry too much about this. Break(B): Set when the BRK instruction has been executed. Decimal(D): Set when the microprocessor is in Decimal Mode. This is a mode where the bytes are cutted in 2 parts, one for a decimal digit, making arithmethic with decimal points more exact, but the maximum value for a byte then becomes 100 instead of 255, so it is never used in games. Interrupt(I): This flag allows interrupts when is clear. An interrupt is a incoming signal from a peripheral that usually indicates to the processor to read data from the peripheral. Set this flag to ignore that signal. This is not valid for Non Maskable Interupts (NMI), they can't be blocked this way. Zero(Z): It is set when the result of last operation is zero. Carry(C): It is set when a carry occurs. In other words, when an add operation surpasses 255. For example (now these are natural numbers, as usual): add 1 to 255 (11111111b); the result is 0. This flag can then be used to do additions in 2 bytes, making the range of values wider. Another use of this flag is in rotating and shifting operations. The bit that is driven out of the byte is stored here. The accumulator(A) is used for arithmethical and logical operations. The index register(X) is used for indexed addressing. It means that its value will be added to the other address given. This is useful for blocks of data: you do a loop which increases (or decreases) X to have access to all data, instead of using an instruction for every byte. Index 2(Y): Used for some limited indexed addressing. This three registers (A, X and Y) are the "work registers" that can be used to store data. The others are changed automatically (in most cases). Now the instructions: ADC Add with carry. Adds A plus another number, plus the Carry flag, and stores the result in A. The other number (source) can be a constant (ADC #45), an address (ADC 100h) or an indexed address (ADC 100h,X). Since C is added too, you must clear it before with CLC. AND Logical AND. Does bitwise AND with the accumulator and a source (memory or inmediate). Logical AND means that result will be true only if the two values are true. In other words, the resultant bit will be 1 only if both bits are 1; otherwise it will be 0. For example: lda #10101010b and #00001111b stores 00001010b in A. As you can see, a "mask" can be used to force zeros and let remain what you want.This is useful when using a single byte for several purposes, so when you need only certain bits you can clear the others. ASL Arithmethic Shift Left. Shifts bits to left in A or memory. All bits will be moved one position to the left. The rightmost position will be filled with a zero, and the leftmost will be stored on the Carry Flag. Example: lda #10011001b asl A gives: 00110010b, and C will be set. One of its uses is quick multiplication by 2 (as shifting decimal numbers to left multiplies by 10). BCC Branch if Carry Clear. Checks the Carry Flag and, if it is set, branches to a certain point; otherwise continues down. Branches are used to make decisions: if certain flag is set (or clear), the program will continue in another point; if not, the program continues as if nothing happened. The point where the program (more exactly, the Program Counter) will jump is a number within a range of -127 to +127 bytes of the actual position. Usually you will use a label to mark the place where it will jump. An example is some lines below. BCS Branch if Carry Set. The same, but when C is set. BEQ Branch if Equal. Branches when the Zero Flag is set. In other words, when the operand is zero. It's called "Equal" because, when comparing, Z is set if both terms are equal. Example: lda bullet_column ;(monster is hit when position of cmp monster_column ; bullet is the same as him) beq monster_dies (here code for monster moves) monster_dies (die sequence: it disappears and the score increases) Here, monster_column and bullet_column are variables, stored somewhere in memory. But instead of using the address number, we can tell first the assembler we will use labels for them. monster_dies is a label for a position where the program will continue. In fact, the most common use of BEQ is checking if something is zero. For example: lda hero_lives beq gameover (here code for continue playing) gameover (continue? insert coins) It's useful with AND to check status of individual bits. This is an example where we suppose the status of button B is in bit 1, in a variable called "joypad1": lda joypad1 and #00000010b ;this blows out everything, except B button beq B_not_pressed (code for fire ball) B_not_pressed (more code) BIT Bit test. Is like AND, but without storing the result in A. BMI Branch if Minus. Branches when N is set. This means, when the bit 7 is set. Is called "minus" because comparisons (which are actually substractions) give a negative number when the second term is greater than the first. Negative numbers have the bit 7 set. It is useful too to check quickly bit 7. BNE Branch if Not Equal. Branches when Z is clear (the operand is not 0). The contrary of BEQ. BPL Branch if Plus. Branches when the Negative Flag is clear, that is, when the bit 7 of the operand is clear. One of its most common uses is to wait for Vblank: wait_for_vblank lda 2002h ;bit 7 at 2002h is automatically set to 1 ;when Vblank occurs bpl wait_for_vblank (code for graphics here) BRK Break. This makes an interruption. Somewhere says that this instruction returns one byte after the correct one. Anyway it is rarely used. BVC Branch if Overflow Clear. Not very useful. BVS Branch if Overflow Set. Same. CLC Clear Carry flag. C becomes 0. You must use it before ADC in most cases. Used too before rotations to prevent errors. CLD Clear Decimal mode. The processor is supposed to begin in Binary mode, and you will never use Decimal mode, so you can forget this one. Nevertheless I have seen is common to use it at the beginning. CLI Clear Interrupt disable. Allows interrupts (NMI not included). Seldom used. CLV Clear Overflow flag. Seldom used. CMP Compare. Compares A with source, for later use of BEQ, BNE, BMI or BPL. The comparison is a substraction that does not give result, but affects the flags: if both terms are equal, the result is 0; if the first is greater, the result is positive; if the second is greater, the result is negative. CPX Compare X. Compares X with source. CPY Compare Y. Same. DEC Decrement source. A memory address is decreased 1. DEX Decrement X. Commonly used in loops: ldx 5 ;loop 5 times loop5 (code) dex bne loop5 One useful loop is addressing a block of memory (where a table with data can be stored), or clearing it: ldx 50h ;the block is 50h bytes lda #0 ;all bytes will be 0 memclear sta block_address,X dex bne memclear DEY Decrement Y. Same. EOR Exclusive OR. Bitwise exclusive OR with A and source. This gives 1 only is the two bits are different; if not, the result is 0. Example: lda #11001100b eor #11110000b stores 00111100b in A. INC Increment source. A memory address is increased 1. INX Increment X. Especially useful for loops with graphics. Since the address at PPU is automatically incremented, we can't use DEX, or the data we put there will be inverted: lda #20h ;address of name table sta 2006h lda #0Ch sta 2006h ldx #0 loopgfx lda mem_block,X sta 2007h inx cpx #20h ;we are coying 20h bytes bne loopgfx As you can see, the use of INX involves a CPX instruction, so loops use usually DEX because it's faster. INY Increment Y. Same. JMP Jump. The program jumps to a label and continues from there. JSR Jump to Subroutine. Is a jump too, but the PC is saved in the stack before, so when the code of the subroutine ends, with RTS you return where the subroutine was "called". This makes possible to call the same subroutine from many points in the program. But often the subroutine will need parameters for its actions. They can be passed in the registers, memory, or manipulating the stack. Look this example, which uses the registers (the easiest way): ldx #5 ;X will be the enemies number (5 enemies) loop_enemies jsr enemy_actions dex bne loop_enemies (rest of program) enemy_actions (here code to approach hero, throw knifes, etc) rts With this you can control 5 enemies with the same code. LDA Load Accumulator. Stores in A the value of a constant or a memory address. Can be used this way: lda #7 This stores in A the value 7 lda 7 This stores in A the value in the memory address 7 lda 7,X This stores in A the value in the memory address plus X. This means that if X is 0, A will have the value in 7; if X is 5, A will have the value of the memory address 12 (0Ch). LDX Load X. The same, but obviously indexed addresing can't be used. LDY Load Y. Same. LSR Logical Shift Right. Shifts to right A or a memory address. The leftmost bit is filled with 0 and the rightmost goes to C. Example: lda #01100110b lsr A stores 00110011b in A, and the Carry Flag will be clear. One of its uses is for quick divisions by 2. NOP No Operation. This does nothing. Can be used for delaying loops (sometimes we need less speed), but in such cases is preferable to use a slower operation, as this is one of the fastests. ORA Logical Inclusive OR. Bitwise OR with A and a source. The result will be false only if both bits are 0; otherwise it will be 1. Example: lda #11001100b ora #10101010b stores 11101110b in A. This can be used to force bits to 1 (as AND is used to force 0). For example, suppose we have the variable hero_status to indicate if he is jumping with the bit 5, and if he is dead with the bit 2: (code to decide when he'll die) lda hero_status ora #00000100b ;set die bit sta hero_status Now the jump: (code to know if button A is pressed) lda hero_status ora #00100000b ;we force bit 5 to 1, without sta hero_status ;touching the other bits (code to jump) Now, suppose he has finished his jump: (code to know when finish jumping) lda hero_status and #11011111b ;force bit 5 to 0, without sta hero_status ;touching other bits As you can see, with ORA, 0 lefts bits unchanged and 1 forces 1; with AND, 1 lefts bits unchanged and 0 forces 0. PHA Push accumulator. Stores A in the stack. PHP Push the Processor Status Register. Stores P (aka flags) in the stack. PLA Pull Accumulator. Stores in A the value in the top of the stack. PLP Pull Processor Status Register. Stores in the flags the value in the top of the stack. ROL Rotate Left. Rotates the bits in A or source one position to the left. The rightmost bit is filled with the value of the Carry Flag, and the left most bit is sent to the Carry Flag. Example: clc ;C will be 0 lda #10010111b rol A (A is now 00101110b, and C is set to 1) rol A (A is now 01011101b, and C is 0) rol A (A is now 10111010b and C is 0) rol A (A is now 01110100b and C is 1) This is useful to fill a byte when a flow of one bit cames from somehwhere. Especifically, from the joypads. RTI Return from Interrupt. After the interrupt code is executed, this returns to where the program was left restoring the PC and the flags that were pushed on the stack when the interrupt began. RTS Return from Subroutine. This returns from a subroutine to the next instruction to where it was called, pulling PC from the stack. SBC Substract with Carry. Stores in A the result of substracting source from A. In a book there was written that the value of C is irrelevant here, but anyway is preferable clear it before. SEC Set Carry flag. Sets C to 1. SED Set Decimal mode. You won't need this. SEI Set Interrupt disable. When the Interrup t disable flag is set, the processor ignores interrupt signals. NMI are not included, they will continue working. As far as I know, the only interrupt in the NES is one that executes every Vblank, but this is a Non Maskable Interrupt (NMI), which can't be affected by this flag. It is enabled or disabled with a register in 2000h. Nevertheless seems is common to use this at the beginning. STA Store Accumulator. Stores A in memory. Indexed addressing can be used. STX Store X. Stores X in memory. STY Store Y. Same. TAX Transfer A to X. Stores A in X. Store instructions don't allows transfer of data between registers, so the existence of the Transfer ones. TAY Transfer A to Y. TSX Transfer S to X. Stores in X the value of the Stack pointer, to know where it is. TXA Transfer X to A. TXS Transfer X to S. Stores in S the value of X. This must be used always at the beginning of a program to use the stack, because it doesn't begin to work automatically. If you forget this, you won't be able to use subroutines nor interrupts. Do this way: ldx #0FFh txs This instruction can be used too to manipulate the stack, but that is quite complex. TYX Transfer Y to A. Getting Started In this section there are pieces of code you can use to make your own demo rom. Just download TASM from http://www.galeon.com/waseiyakusha/indexe.html and paste the code. Remember this is only to see something working, but you must add what will be the "essence" of your demo (or game?). First, we assign a label to some memory addresses. This will make the code more readable (I use a "V" to distinguish them to other labels): Vcontrol1 =0 Vcolumn =1 Vline =2 Then begin the code at C000h: .org 0C000h begin ldx #0FFh ;initialize stack txs vblank lda 2002h ;wait for Vblank bpl vblank lda #00000000b ;control PPU 1 (consult other docs for details) sta 2000h lda #10011110b ;control PPU 2 sta 2001h lda #0 ;no scroll sta 2005h sta 2005h ; lda #3Fh ;set the background palette sta 2006h ;(if you skip this, all will be gray -color 0-) lda #0h sta 2006h lda #0Fh ;colors (black) sta 2007h ;use Chris Covell's RGB to see values lda #1 ;(blue) sta 2007h lda #16h ;(red) sta 2007h lda #28h ;(yellow) sta 2007h ; lda #3Fh ;sprites palette sta 2006h lda #10h sta 2006h lda #0Fh ;colors (this is the transparent color, use the sta 2007h ;same as background, black in this case) lda #30h ;(white) sta 2007h lda #37h ;(skin) sta 2007h lda #7 ;(brown) sta 2007h ;---------------- background lda #20h ;this fills the name table with tile numbers sta 2006h ;it's only copying the table at the end sta 2006h ldx #0 loop1 lda 0E000h,X sta 2007h inx cpx #0E0h bne loop1 ldx #0 loop2 lda 0E0E0h,X sta 2007h inx cpx #0E0h bne loop2 ldx #0 loop3 lda 0E1C0h,X sta 2007h inx cpx #0E0h bne loop3 ldx #0 loop4 lda 0E2A0h,X sta 2007h inx cpx #0E0h bne loop4 ;----------------- lda #10 ;set initial values for line and column sta Vcolumn ;of the sprite sta Vline ;------------ mainloop jsr readcontrol1 jsr delay ;we want some delay to be able to see it jsr delay jsr delay jsr delay ;---(add your code here) lda Vcontrol1 ;if no button is pressed, just go ahead beq nobutton and #00000001b ;if it is right, increase column number beq noright ; to move to right inc Vcolumn jmp nobutton noright lda Vcontrol1 and #00000010b ;if it's left, to the left beq noleft dec Vcolumn jmp nobutton noleft ;(continue the same, for up, down, etc) ;Of course, the following code works too: ; lda Vcontrol1 ; cmp #(buttonnumber) ; bne nothisbutton ;There is only a little diference between this and the other one. nobutton jsr sprites jmp mainloop ;---------------------------------------------------------------------------- ; SUBROUTINES ;---------------------------------------------------------------------------- ;------------------------------------------ read control 1 readcontrol1 lda #1 ;reset control 1 sta 4016h lda #0 sta 4016h sta Vcontrol1 ldx #8 ;read control 1 control1 lda 4016h ;this loop reads bit by bit and #1 ;and rotates to fill the byte ora Vcontrol1 asl A sta Vcontrol1 dex bne control1 bcc noabutton ;then checks if, as a result of rotation, the lsr A ;first bit read (button A) was driven to C ora #10000000b bne endreading noabutton lsr A endreading sta Vcontrol1 rts ;(Vcontrol1 is filled as follows: (A,B,Select,Start,Up,Down,Left,Right) ;------------------------------------------ delay a while delay pha ;first save in stack A and X txa pha ldx #0FFh delayloop lda 8000h,X ;then loop doing a slow operation dex bne delayloop pla ;and restore A and X tax pla rts ;------------------------------------------ Sprites sprites lda 2002h ;wait for vblank bpl sprites lda #0 sta 2003h ;------- lda Vline sta 2004h lda #1 ;use tile #1 sta 2004h lda #0 ;attributes in their simplest form sta 2004h lda Vcolumn sta 2004h rts ;----------------------------------------------- background .org 0E000h ;of course, you can use another place in memory .db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0 .db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0 .db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1 .db 1,1,1,1,0,0,0,5,5,5,5,5,0,0,4,0 .db 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0 .db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0 ; and continue other 25 rows, 32 values each (but cut them in 2 lines, ;or the assembler will not work). These are the ;tile numbers of the background. ;-------------------------------------------------------------------- .org 0FF00h ;interruption NMI (useless, but just in case) rti .org 0FFFAh ;interruption vectors .db 0,0FFh,0,0C0h,0,0C0h .end Assemble it (tasm -65 -b -f00 filename.asm), adn rename the resulting .obj file to .prg. Then, make a chr rom with this simple code (in a new file), and draw it with a tile editor (name it filename.chr): .org 0 .db 0FFh .org 2000h .end Make a header too (filename.hdr): .db 4eh,45h,53h,1ah,1,1,0,0,0,0,0,0,0,0,0,0 .end Join them with tniNES (tnines -j filename), and you're done! Thanks to: YOSHi (needless to say). Tony Young With his page and demo I began to understand YOSHi's document (http://members.aol.com/TYoung79/nesprog.html) Jonathan Bowen for his summary of the 6502 instruction set Questions/suggestions: bokudono@netscape.net Waseiyakusha (http://www.galeon.com/waseiyakusha/indexe.html)
alojamiento web gratis
Otros servicios ofrecidos por HispaVista:
Ofertas de Trabajo y Busco pareja
Consigue una página web gratis o un
alojamiento web profesional con Galeón