While working on the 6809 simulator, one of the test programs is a FORTH interpreter written for the 6800. The 6809 is not binary code compatible with the 6800 but 6800 assembly language source files can be assembled into 6809 binaries.
FORTH is based on manipulating 16-bit values on a stack. Since the 6809 was designed to efficiently handle such code, I chose to see how well it can do.
The FORTH word I chose at random to study is “equal.” It compares the top two values on the stack and replaces them with a one if equal and zero otherwise.
The original 6800 implementation is:
085D 5F [2] 02159 clrb ; Presume not equal
02160
085E 32 [4] 02161 pula ; Get a number
085F 97 0A [4] 02162 staa Tmp
0861 32 [4] 02163 pula
0862 97 0B [4] 02164 staa Tmp+1
02165
0864 32 [4] 02166 pula ; Get the other
0865 91 0A [3] 02167 cmpa Tmp
0867 32 [4] 02168 pula
0868 26 05 (086F) [4] 02169 bne ___eq_NotEqual
02170
086A 91 0B [3] 02171 cmpa Tmp+1 ; Upper bytes equal, compare lower
086C 26 01 (086F) [4] 02172 bne ___eq_NotEqual
02173
086E 5C [2] 02174 incb
02175
086F 02176 ___eq_NotEqual:
086F 4F [2] 02177 clra ; Clear upper byte
02178
0870 37 [4] 02179 pshb ; Push result
0871 36 [4] 02180 psha
02181
0872 7E 017A [3] 02182 jmp Next
55 machine cycles through the equal path.
A much better 6809 implementation is:
0000 35 06 [7] 00001 puls D ; Get first number
00002
0002 10A3 E4 [7] 00003 cmpd ,S ; Compare with second number
0005 26 04 (000B) [3] 00004 bne ___eq_NotEqual
00005
0007 C6 01 [2] 00006 ldab #1 ; Indicate equal
00007
0009 20 01 (000C) [3] 00008 bra ___eq_Finish
00009
000B 00010 ___eq_NotEqual
000B 5F [2] 00011 clrb
00012
000C 00013 ___eq_Finish
000C 4F [2] 00014 clra
000D A7 E4 [4] 00015 sta ,S ; Store result
00016
000F 7E 0012 [4] 00017 jmp Next ; Pass control to inner interpreter
32 machine cycles through the equal path.
My FORTH system has never been adapted to the 6502 so I will have to do some speculating as to how the equal word might be implemented there.
A simple paraphrase of the 6800 code to the 6502 would be:
0006 68 [4] 00006 pla ; Get low byte of first number
0007 85 04 [3] 00007 sta Tmp
00008
0009 68 [4] 00009 pla ; Get high byte of first number
000A 85 05 [3] 00010 sta Tmp+1
00011
000C 68 [4] 00012 pla ; Compare low bytes
000D C5 04 [3] 00013 cmp Tmp
000F D0 0A (001B) [2/3] 00014 bne ___eq_NotEqual
00015
0011 68 [4] 00016 pla ; Compare high bytes
0012 C5 05 [3] 00017 cmp Tmp+1
0014 D0 06 (001C) [2/3] 00018 bne ___eq_NotEqual2
00019
0016 A2 01 [2] 00020 ldx #1
00021
0018 4C 001E [3] 00022 jmp ___eq_Finish
00023
001B 00024 ___eq_NotEqual
001B 68 [4] 00025 pla ; Discard high byte of second number
00026
001C 00027 ___eq_NotEqual2
001C A2 00 [2] 00028 ldx #0
00029
001E 00030 ___eq_Finish
001E A9 00 [2] 00031 lda #0 ; Push result
0020 48 [3] 00032 pha
0021 8A [2] 00033 txa
0022 48 [3] 00034 pha
00035
0023 4C 0026 [3] 00036 jmp Next
50 machine cycles through the equal path.
The stack of the 6502 is limited to 256 bytes.
FORTH is stack-oriented. 16-bit words are manipulated on the stack.
I do not have enough experience to know whether 128 words are enough, but it seems limiting, especially considering that the small 6502 stack is shared with the return addresses for subroutine calls and interrupts.
An easy way to double the amount of stack space is to allocate two 256 byte blocks of memory, one for the low bytes of words and one for the high, keep pointers to them in the zero page and use the Y register as the stack pointer. This stack is separate from the processor’s machine stack and is needed anyway since FORTH actually mandates two stacks, a data stack and a return stack.
An added benefit of this approach is that both bytes of the top word on the stack are easily accessible without having to mess with the stack pointer.
Code for the equal word on the 6502 might look something like this:
0004 B1 00 [5/6] 00018 lda (StkLo),Y ; Get low byte of first number
0006 AA [2] 00019 tax
00020
0007 B1 02 [5/6] 00021 lda (StkHi),Y ; Get high byte of the first number
00022
0009 C8 [2] 00023 iny ; Jettison the first number
00024
000A D1 02 [5/6] 00025 cmp (StkHi),Y ; Compare high bytes
000C D0 0A (0018) [2/3] 00026 bne ___eq_NotEqual
00027
000E 8A [2] 00028 txa ; Compare low bytes
000F D1 00 [5/6] 00029 cmp (StkLo),Y
0011 D0 05 (0018) [2/3] 00030 bne ___eq_NotEqual
00031
0013 A9 01 [2] 00032 lda #1 ; Indicate equal
00033
0015 4C 001A [3] 00034 jmp ___eq_Finish
00035
0018 00036 ___eq_NotEqual
0018 A9 00 [2] 00037 lda #0 ; Indicate not equal
00038
001A 00039 ___eq_Finish
001A 91 00 [6] 00040 sta (StkLo),Y ; Store result
001C A9 00 [2] 00041 lda #0
001E 91 02 [6] 00042 sta (StkHi),Y
00043
0020 4C 0023 [3] 00044 jmp Next ; Pass control to inner interpreter
52 machine cycles through the equal path. Slightly slower, but worth doing if the larger stack is needed.
That got me curious how much the original 6800 implementation can benefit from “indexing the stack:”
0000 32 [4] 00001 pula ; Get high byte of first number
0001 33 [4] 00002 pulb ; Get low byte of first number
00003
0002 30 [4] 00004 tsx ; Prepare to index the stack
00005
0003 E1 01 [5] 00006 cmpb 1,X ; Compare low bytes
0005 26 08 (000F) [4] 00007 bne ___eq_NotEqual
00008
0007 A1 00 [5] 00009 cmpa ,X ; Compare high bytes
0009 26 04 (000F) [4] 00010 bne ___eq_NotEqual
00011
000B 86 01 [2] 00012 ldaa #1 ; Indicate equal
00013
000D 20 01 (0010) [4] 00014 bra ___eq_Finish
00015
000F 00016 ___eq_NotEqual
000F 4F [2] 00017 clra ; Indicate not equal
00018
0010 00019 ___eq_Finish
0010 A7 01 [6] 00020 staa 1,X
0012 6F 00 [7] 00021 clr ,X
00022
0014 7E 0017 [3] 00023 jmp Next ; Pass control to inner interpreter
52 machine cycles through the equal path. Surprisingly identical to the 6502!