6502 Tips and Techniques

Loop n times
------------

Using X or Y as counter

	ldx	#n
loop	...
	dex
	bne	loop

Using A as counter

	lda	#n
	sec
loop	...
	adc	#-2
	bne	loop


Iterate through table
---------------------

Forwards

	ldx	#256 - size
loop	lda	table + size - 256,x
	...
	inx
	bne	loop

Backwards

	ldx	#size
loop	lda	table - 1,x
	...
	dex
	bne	loop


Branch Always
-------------

Only the NMOS 6502 lacks a branch always instruction; other 6502 variants
allow BRA.

	clv		; overflow flag is rarely used
	bvc	dest

After a load immediate, the zero flag is always known:

	lda	#0
	beq	dest	; always branches
	
	ldx	#123
	bne	dest	; always branches

A single byte can be skipped by overlapping instructions:

sub1	sei
	db	$C9	; cmp #immediate
sub2	cli
	...


Cycle Delays
------------

	nop		; 2 cycles
	lda	0	; 3 cycles
	pha		; 7 cycles
	pla
	
A single cycle can be selectively inserted based on a condition:

	bne	next	; zero: 2 cycles, non-zero: 3 cycles
next

If the stack isn't being used, pha and pla achieve the most delay for a single
byte:

	pha		; 3 cycles
	pla		; 4 cycles


Jump indirect using stack
-------------------------

An indirect address can be pushed on the stack and then jumped to by
returning. Since the address is on the stack, no temporary locations have to
be assigned for the destination address and the code is re-entrant. Normal
return increments the address, but rti doesn't:

	lda	#$12	; push high byte first
	pha
	lda	#$34
	pha
	php
	rti		; jumps to $1234


BIT #immediate for NMOS 6502
----------------------------

Again, this applies only to the NMOS 6502 which lacks this addressing mode.
Set up 8 constants in zero page, one for each bit, i.e. $01, $02, $04, $08,
$10, $20, $40, $80. Then BIT zero-page can be used to test a particular bit.


Combining shift register and counter
------------------------------------

An 8-bit shift register for loading 8 bits of data can be combined with a 1-8
iteration counter:

	lda	#$80	; 8 iterations ($40 = 7 iter, $20 = 6, etc.)
	sta	temp
loop	lda	port	; input data is in bit 0
	lsr	a
	ror	temp	; carry contains bit shifted out of temp
	bcc	loop


Using S as fast index register
------------------------------

The stack register (S) can be used as an extra index register for going
through a small buffer more rapidly than possible with X and Y. It might be
useful where a buffer of needs to be quickly read to or written from some
output device. The data is simply pushed on the stack, then popped off the
stack. Both operations are faster than using an index register, and leave both
index registers free for other use.

This example quickly outputs a buffer of 0-terminated data to a memory-mapped
device outside of zero-page. Each byte takes 11 cycles to read from the buffer
and output:

	lda	#0	; 0 terminator
	pha
	...		; push data on stack
	
	jmp	next
read	sta	port	; write to device
next	pla
	bne	read

This example quickly reads data from a device and stops when it receives 0.
Each byte takes 10 cycles to input and write to the buffer: 

	tsx		; save current stack pointer
	stx	end
	
write	lda	port	; read from device
	pha
	bne	write
	
read	pla
	...		; use data
	tsx
	cpx	end
	bne	read

By putting the buffer at the bottom of page 1, S can be used as both a counter
and index for a write buffer. Each byte takes 12 cycles to input and write to
the buffer:

	tsx		; save stack
	stx	stack
	
	ldx	#size
	txs
	
loop	lda	port
	pha
	tsx
	bne	loop

	...		; use data
	
	ldx	stack	; restore stack
	tsx

By putting the buffer at the top of page 1, S can be used as both a counter
and index for a read buffer. The normal stack would need to be placed lower in
page 1 to coexist with this scheme. Each byte takes 13 cycles to read from the
buffer and output:

	ldx	#0	; init stack
	txs
	
	...		; push data on stack
	
loop	pla
	sta	port
	tsx
	bne	loop

