[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

It was easy to morph a copy of hex() to make bin() since it was designed with that in mind. Doing oct() will not be so easy since octits cross byte boundaries.

The loop in the test program now looks like this:

Base = 10
Number = input('Number? ')
while Number != '':
	BaseIn = input('Base? ')
	if BaseIn != '':
		# Change base
		Base = int(BaseIn, 10)
	Value = int(Number, Base)
	print(Value, hex(Value), bin(Value), Base)
	Number = input('Another? ')
1 Like

My vote would be left / right shift. In my mind, this is conceptually easy and may serve as an early glimpse into what you may be getting yourself into with the octal crossing byte boundaries.

1 Like

I will probably implement the shifts while thinking about how to do oct().

Shifting has to affect every byte of an integer. I am not wanting to have to shift a potentially long integer for each three bits oct() formats. Especially on the 6502 on which a byte has to be in the only accumulator to be shifted as seen in this particularly ugly piece of code to multiply a number by eight:

 43F5 A5 0C	      [3] 11272		lda	Int3			; Get maximum length of the result
 43F7 A4 0D	      [3] 11273		ldy	Int3+1			; Eight chars for each byte +
 43F9 0A	      [2] 11274		asl	A			;   two for the prefix and one sign
 43FA AA	      [2] 11275		tax
 43FB 98	      [2] 11276		tya
 43FC 2A	      [2] 11277		rol	A
 43FD A8	      [2] 11278		tay
 43FE 8A	      [2] 11279		txa
 43FF 0A	      [2] 11280		asl	A
 4400 AA	      [2] 11281		tax
 4401 98	      [2] 11282		tya
 4402 2A	      [2] 11283		rol	A
 4403 A8	      [2] 11284		tay
 4404 8A	      [2] 11285		txa
 4405 0A	      [2] 11286		asl	A
 4406 AA	      [2] 11287		tax
 4407 98	      [2] 11288		tya
 4408 2A	      [2] 11289		rol	A
 4409 86 0A	      [3] 11290		stx	Int2
 440B 85 0B	      [3] 11291		sta	Int2+1

Hmm, now that I look at it, it is a couple of cycles faster to do this:

	lda	Int3
	sta	Int2
	lda	Int3+1
	sta	Int2+2
	asl	Int2
	rol	Int2+1
	asl	Int2
	rol	Int2+1
	asl	Int2
	rol	Int2+1

I have never given the read/modify/write instructions much respect, but in this case, they are appropriate.

As they currently stand, hex() and bin() currently format the number “backward” from the low order byte to the high even though they do not have to. The reason is that one of the ways to do octal is to format three bits at a time starting at the low end until I run out of bits.

The other way is to take advantage of the fact that octal numbers resync to byte boundary every three bytes, so bytes & 3 gives the number of odd bytes at the upper end of the number. Handle that special case then do the rest of the number three bytes at a time.

2 Likes

I think I have settled on a way to do the octal formatting.

while ByteCount >= 3:
	Format lowest order three bytes into eight octits
	Advance pointer into the integer by three
	ByteCount = ByteCount - 3
if ByteCount > 0:
	Handle the left over one or two bytes

This will let me avoid dividing by three. I am glad I designed hex() to work from right to left…

1 Like

Oct() is working now. It still needs some more testing. I also need to look at all three and factor out some of the common code.

The test loop now looks like this:

Base = 10
Number = input('Number? ')
while Number != '':
	BaseIn = input('Base? ')
	if BaseIn != '':
		# Change base
		Base = int(BaseIn, 10)
	Value = int(Number, Base)
	print(Value, hex(Value), oct(Value), bin(Value), Base)
	Number = input('Another? ')

Then on to make the project stop appearing shiftless.

1 Like

In anticipation of this pending acquisition Altair 8800 mini-clone up for grabs, I have begun writing the run-time library code for the 8080.

This is a bit of 6502 code:

 039D AD 378C	      [4] 00373		lda	Zero_List		; Get list of variables to zero
 03A0 85 18	      [3] 00374		sta	Ptr0
 03A2 AD 378D	      [4] 00375		lda	Zero_List+1
 03A5 85 19	      [3] 00376		sta	Ptr0+1
 			  00377
 03A7			  00378	InitRTL1
 03A7 05 18	      [3] 00379		ora	Ptr0
 03A9 F0 18 (03C3)  [2/3] 00380		beq	InitRTL2		; No more uninitialized variables
 			  00381
 03AB A9 00	      [2] 00382		lda	#0			; Zero out this variable
 03AD A0 02	      [2] 00383		ldy	#2			; Point to the variable
 03AF 91 18	      [6] 00384		sta	(Ptr0),Y
 03B1 C8	      [2] 00385		iny
 03B2 91 18	      [6] 00386		sta	(Ptr0),Y
 			  00387
 03B4 A0 00	      [2] 00388		ldy	#0
 03B6 B1 18	    [5/6] 00389		lda	(Ptr0),Y
 03B8 AA	      [2] 00390		tax
 03B9 C8	      [2] 00391		iny
 03BA B1 18	    [5/6] 00392		lda	(Ptr0),y
 03BC 85 19	      [3] 00393		sta	Ptr0+1
 03BE 86 18	      [3] 00394		stx	Ptr0
 			  00395
 03C0 4C 03A7	      [3] 00396		jmp	InitRTL1
 			  00397
 03C3			  00398	InitRTL2
 03C3 AD 378E	      [4] 00399		lda	Init_List		; Get list of variables to initialize
 03C6 85 18	      [3] 00400		sta	Ptr0
 03C8 AD 378F	      [4] 00401		lda	Init_List+1
 03CB 85 19	      [3] 00402		sta	Ptr0+1
 			  00403
 03CD			  00404	InitRTL3
 03CD 05 18	      [3] 00405		ora	Ptr0
 03CF F0 1E (03EF)  [2/3] 00406		beq	InitRTL4		; No more initialized variables
 			  00407
 03D1 A0 02	      [2] 00408		ldy	#2			; Load variable value
 03D3 B1 18	    [5/6] 00409		lda	(Ptr0),Y
 03D5 AA	      [2] 00410		tax
 03D6 C8	      [2] 00411		iny
 03D7 B1 18	    [5/6] 00412		lda	(Ptr0),Y
 			  00413
 03D9 C8	      [2] 00414		iny				; Store variable value
 03DA 91 18	      [6] 00415		sta	(Ptr0),Y
 03DC C8	      [2] 00416		iny
 03DD 8A	      [2] 00417		txa
 03DE 91 18	      [6] 00418		sta	(Ptr0),Y
 			  00419
 03E0 A0 00	      [2] 00420		ldy	#0
 03E2 B1 18	    [5/6] 00421		lda	(Ptr0),Y
 03E4 AA	      [2] 00422		tax
 03E5 C8	      [2] 00423		iny
 03E6 B1 18	    [5/6] 00424		lda	(Ptr0),y
 03E8 85 19	      [3] 00425		sta	Ptr0+1
 03EA 86 18	      [3] 00426		stx	Ptr0
 			  00427
 03EC 4C 03CD	      [3] 00428		jmp	InitRTL3
 			  00429
 03EF			  00430	InitRTL4

and the corresponding 8080 code:

 0249 2A 02C1	     [16] 00295		lhld	Zero_List		; Get list of variables to zero
 			  00296
 024C			  00297	InitRTL1
 024C 7C	      [5] 00298		mov	A,H
 024D B5	      [4] 00299		ora	L
 024E CA 025E	     [10] 00300		jz	InitRTL2		; No more uninitialized variables
 			  00301
 0251 3E 00	      [7] 00302		mvi	A,0			; Zero out this variable
 0253 5E	      [7] 00303		mov	E,M			; Get low byte of next address
 0254 23	      [5] 00304		inx	H
 0255 56	      [7] 00305		mov	D,M			; Get high byte of next address
 0256 23	      [5] 00306		inx	H			; Point to the variable
 0257 77	      [7] 00307		mov	M,A
 0258 23	      [5] 00308		inx	H
 0259 77	      [7] 00309		mov	M,A
 			  00310
 025A EB	      [4] 00311		xchg				; Point HL to next
 			  00312
 025B C3 024C	     [10] 00313		jmp	InitRTL1
 			  00314
 025E			  00315	InitRTL2
 025E 2A 02C3	     [16] 00316		lhld	Init_List		; Get list of variables to initialize
 			  00317
 0261			  00318	InitRTL3
 0261 7C	      [5] 00319		mov	A,H
 0262 B5	      [4] 00320		ora	L
 0263 CA 0275	     [10] 00321		jz	InitRTL4		; No more initialized variables
 			  00322
 0266 5E	      [7] 00323		mov	E,M			; Get low byte of next address
 0267 23	      [5] 00324		inx	H
 0268 56	      [7] 00325		mov	D,M			; Get high byte of next address
 0269 23	      [5] 00326		inx	H			; Point to the value
 026A 4E	      [7] 00327		mov	C,M			; Get low byte of the value
 026B 23	      [5] 00328		inx	H
 026C 46	      [7] 00329		mov	B,M			; Get high byte of value
 026D 23	      [5] 00330		inx	H			; Point to the variable
 026E 71	      [7] 00331		mov	M,C			; Restore the value
 026F 23	      [5] 00332		inx	H
 0270 70	      [7] 00333		mov	M,B
 			  00334
 0271 EB	      [4] 00335		xchg				; Point HL to next
 			  00336
 0272 C3 0261	     [10] 00337		jmp	InitRTL3
 			  00338
 0275			  00339	InitRTL4

Note that a 2.5 MHz 8080 is roughly equivalent to a 1 MHz 6502.

Code for the 6800;

 020A 4F	      [2] 00299	         clra             ; The zero
 020B FE 025B	      [5] 00300	         ldx    Zero_List ; Get list of variables to zero
 			  00301
 020E 27 08 (0218)    [4] 00302	         beq    InitRTL2  ; No uninitialized variables
 			  00303
 0210			  00304	InitRTL1
 0210 A7 02	      [6] 00305	         staa   2,X       ; Zero out this variable
 0212 A7 03	      [6] 00306	         staa   3,X
 			  00307
 0214 EE 00	      [6] 00308	         ldx    ,X        ; Point to the next variable
 			  00309
 0216 26 F8 (0210)    [4] 00310	         bne    InitRTL1
 			  00311
 0218			  00312	InitRTL2
 0218 FE 025D	      [5] 00313	         ldx    Init_List ; Get list of variables to initialize
 021B 27 0D (022A)    [4] 00314	         beq    InitRTL4  ; No more initialized variables
 			  00315
 021D			  00316	InitRTL3
 021D A6 02	      [5] 00317	         ldaa   2,X       ; Load variable value
 021F A7 04	      [6] 00318	         staa   4,X       ; Store variable value
 0221 A6 03	      [5] 00319	         ldaa   3,X
 0223 A7 05	      [6] 00320	         staa   5,X
 			  00321
 0225 EE 00	      [6] 00322	         ldx    ,X        ; Point to the next variable
 0227 26 F4 (021D)    [4] 00323	         bne    InitRTL3
 			  00324
 0229			  00325	InitRTL4

and the 6809;

 0200 CC 0000		      [3] 00290	         ldd    #0        ; The zero
 0203 BE 0266		      [6] 00291	         ldx    Zero_List ; Get list of variables to zero
 				  00292
 0206 27 06 (020E)	      [3] 00293	         beq    InitRTL2  ; No uninitialized variables
 				  00294
 0208				  00295	InitRTL1
 0208 ED 02		      [6] 00296	         std    2,X       ; Zero out this variable
 				  00297
 020A AE 84		      [5] 00298	         ldx    ,X        ; Point to the next variable
 				  00299
 020C 26 FA (0208)	      [3] 00300	         bne    InitRTL1
 				  00301
 020E				  00302	InitRTL2
 020E BE 0268		      [6] 00303	         ldx    Init_List ; Get list of variables to initialize
 0211 27 21 (0234)	      [3] 00304	         beq    InitRTL4  ; No more initialized variables
 				  00305
 0213				  00306	InitRTL3
 0213 EC 02		      [6] 00307	         ldd    2,X       ; Load variable value
 0215 ED 04		      [6] 00308	         std    4,X       ; Store variable value
 				  00309
 0217 AE 84		      [5] 00310	         ldx    ,X        ; Point to the next variable
 0219 26 F8 (0213)	      [3] 00311	         bne    InitRTL3
 				  00312
 021B				  00313	InitRTL4
1 Like

I just finished the factoring of hex(), oct() and bin(). There was substantial savings due to the amount of common code between the three.

I’ll implement left and right shift, then consider doing another release.

As for retargeting, I am pushing forward on the run-time library for the 8080, 6800 and 6809. The AVR is going slowly because I am so rusty on that instruction set.

It has become obvious that it is much more efficient if I do the retargeting a subroutine at a time on all targets instead of doing one target at a time until it is done. So I will likely add 32-bit and 16-bit x86 and the 68000. And probably ARM; this is as good a time to learn ARM assembly language programming as any since Raspberry PIs are cheap and popular. Someone asked about the Z-80, so maybe that as well even though I do not currently have any Z-80 tools. And maybe the 1802 in Walter’s honor.

2 Likes

May I add a request for Texas Instruments TMS9900 processor. No stack, only 3 register, 16 ‘virtual’ registers in 16 consecutive 16-bit memory locations, allowing for fast context switches.

The hardware platform that would be the most popular TI 99/4A Home Computer. There is a huge following at atariage (http://atariage.com/forums/forum/119-ti-994a-development/), where folks are still developing new hardware and software for this platform. There are aftermarket BASIC compilers and several Forth implemenations.

Be advised that due to the architectural quirks of the TI 99/4A, there are only 256 bytes of CPU RAM resident on the standalone console. You would likely need to develop with the assumption that the 32K memory expansion was present.

I would volunteer to be a test candidate for your compiler efforts, although there are several avid folks at atariage that would likely salivate at the opportunity to try python on this platform.

Excellent TI 99/4A technical info website by Dr. Thierry Nouspikel, who started his TI 99/4A work as a student. Very rich source of info.
http://www.unige.ch/medecine/nouspikel/ti99/titechpages.htm

Complete TMS9900/TI 99/4A cross development too suite (written in python)
https://endlos99.github.io/xdt99/

Some TMS9900 retro fun

1 Like

I’ll consider that though I have no experience with the 9900 instruction set beyond knowing what the workspace pointer is.

On the TMS9900, other than the Workspace Pointer, there are only two other registers, Status and Program Counter.

A hardware or software interrupt has a two-word vector, a new workspace pointer and program counter. The old workspace pointer, program counter, and status are stored in R13, R14, R15 of the new workspace (possibly not in that order).

The instruction set is quite rich and mostly orthogonal.

This is a two operand machine, the 2nd parameter and the destination are the same

Addressing modes are:

Register immediate: Load or operate with 16 bit source literal

Register indirect: Register has address of operand
Register indirect w/post increment: Register has address of operand, is increment after operation
Register indirect with immediate address offset: Immediate value is added to register contents, resulting value is address of operand. (R0 is a special case)
Absolute address: Immediate address of operand is specified

All of previous addressing modes (except for register immediate) can be used for both source and destination, in any combination.

Non-vectored subroutine calls keep the same workspace, but store the return address in current R11. User must manage temp storage of status or other registers.

Any register (or registers) may be used for a virtual stack pointer. Register indirect w/post-autoincrement can implement either push or pop (depending up which direction stack grows). Unfortunately there is no atomic register indirect w/pre-decrement. There are instructions to increment by two (INCT) or decrement by two (DECT), which when combined with the register indirect addressing mode, can increment a stack in any direction. Register indirect with immediate address offset could be used to access contents of a virtual stack frame.

A rich set of relative jumps (+128 words/-127 words displacement) is provided. A long (16-bit) direct branch is provided, which when combined with relative jumps, allows conditional branching to the full 64 KByte address space.

I learned to program assembly language on this processor, which was a blessing and a curse. Since there was no stack, that was a new thing to learn when moving to any other processor. OTOH, I was really spoiled by the rich instruction set and the availability of 16 16-bit registers, as opposed to the 8-bitters (Z80, 6502, 6800, 8051,…). It would have been relatively easy to move to assembly programming of a 68000 or even an MSP430.

1 Like

The TI 99/4A does something similar to the SWEET16, but carries it to an extreme. There is an byte-code language interpreter for a language called GPL (Graphics Programming Language) that is in the console ROMs.

The TI BASIC interpreter is also in the ROM, and the BASIC interpreter is almost wholly written in GPL. This is why TI BASIC is soooo slow - it’s double interpreted. Not only that, but GPL code isn’t stored in normal ROM, but instead in special PMOS (yep, that’s right, PMOS!) ROMs called GROMs (Graphics ROMS). These were basically cheap calculator style memories in 16-pin packages. The GROMS had an 8-bit data bus and 1-bit address bit, which determined whether the access was an address (to set or read a byte of the ROM address) or data (to read from the location pointed to by the internal 16-bit address). As long as data was read from sequential GROM addresses the internal counter would autoincrement after each read. But if data was needed from another non-sequential location, then two bytes of address would have to be written before the new data could be fetched.

As if the above wasn’t enough, the 16K RAM was on the ‘backside’ of the video controller and was accessed in a manner similar to the GROMs. And the tokenized BASIC code and data was stored in the 16K video RAM. So there were two levels of interpretation for lines of BASIC - first to interpret the BASIC token stream, an then each instruction that made up the BASIC interpreter was an interpreted GPL instruction.

According to one of the architects of the 99/4A system, when it was designed the plan was to design a GPL processor chip that would directly execute GPL code. Had this happened the 99/4A would have screamed. But those plans were scrapped very early on.

1 Like

And to make that even worse, you had to cross a 16 to 8 bit bus arbiter chip to get form the CPU to the video controller.

2 Likes

The story was that bottleneck was named after someone who worked at the Lubbock plant and taught evening classes at Tech. Supposedly, he designed it. The name escapes me. I’ll have to ask my brother.

It was really a shame that the world’s first commercial 16 bit processor achitecture in the 99/4 had that speed bottleneck designed into it. The guy’s name wasn’t Paul Cleveland, was it?

2 Likes

There were only two devices directly on the 16-bit bus - the system ROM and 128 words (256 bytes) of scratchpad RAM. This was the only directly addressable RAM available without the memory expansion board.

Thus, cartridge software modules, which were almost all required to run on an unexpanded console, only had 128 bytes of directly addressable RAM. This was really a challenge when speedy RAM buffers were needed. In addition to the limitation that the 16K VRAM (video RAM) was indirectly addressed, there were limits on how often the CPU could access the VRAM.

Reading a 16-bit value from the 16-bit bus required 2 clock cycles with no wait states. Reading the same 16-bit value on the 8-bit bus required 6 clock cycles, where 4 wait states were inserted to allow time for two psuedo 8-bit data accesses. When working on PARSEC as a TI intern, one of the techniques we used to increase the software horizontal scrolling speed was to move a small code loop into the 16-bit RAM so it would run faster.

A quirk of the 9900 was that while the data bus was 16 bits, there was no byte select for the case where byte accesses occurred. Consequently, even though a byte instruction might be executing, both psuedo 8-bit bus cycles had to be completed.

To make this even worse, any memory write was always preceded by a memory read to the same address. I was told that this was done to save microcode space in the 9900 by having reads and writes execute the same microcode, with the only difference that the data which was read during a destination write was ignored.

1 Like

I wasn’t aware of any name for the 16-to-8 bit bus converter or it being referenced to anyone in particular. My supervisor during my 2nd intern stint (was called a ‘co-op’ position back then) was Granville Ott, who was involved in the early architecture planning for the 99/4 and he did teach some classes at Tech. But no one called it the ‘Ott’ bus.

According to Granville and others, the original plan was not to have a TMS9900 (with 16-bit data bus) in the console. Instead, the CPU was supposed to be a TMS9985, which had an 8-bit bus. The reason for this was to keep the peripheral costs reasonable, as a 16-bit bus would require 2x of peripheral ROM chips and corresponding bus buffers. [ASIDE: the TMS9985 was to the TMS9900 as the 8088 was to the 8086 - architecturally the same but with 8-data buses]

When it became apparent that the 9985 would not be ready in time for the launch of the 99/4, a decision was made to drop in a 9900 with additional logic so that the cartridge interface, peripheral interface, and peripherals would not have to be changed. Otherwise the whole system would have to be redesigned and substantially delay entry into the market. This would have a profound effect later when TI got in a price war with Commodore. Although keeping the 8-bit interface allowed the cartridge and peripheral costs to be as low as possible, the console had much higher costs due the more costly 9900 + additional bus convert logic. The Commodore was using the cheapest CPU available, the 6502, which would still have had a cost advantage over even the 9985. You can guess who won the price wars.

There was an interesting story being told about why the 9985 was not ready. Supposedly the 9985 designer was a brilliant guy but had a very unconventional personality and work flow. His manager tried to make him conform and the designer up and quit on him. None of the other designers were able to take his work and finish it, as the way he had decomposed the problem was apparently too unconventional also.

The TMS9985 was never produced. However, a TMS 9995 was produced, which was similar to the what the 9985 would have been. Even though it also had an 8-bit bus, it had a faster clock, was pipelined, and all of the instructions took less time. As a result, it was considerably faster than the 9900 even though it was on an 8-bit bus. There was also 256 bytes of on-chip RAM, which essentially meant that all registers could be on-chip. The 9995 was used in the 99/4A follow-on console, the 99/8 (code-named Armadillo). To make the 99/8 compatible with 99/4A games which used software delays (no other choice for the software guys), there was a 99/4A emulation mode that introduced wait states in the external memory cycles to better match the slower 99/4A.

2 Likes

I’m pretty sure it was not him.

The name Ott is vaguely familiar; Granville is not. It would have been known derisively as the “Ott Bottleneck” or something like that.

Did you know a Steve Bell? He was a computer science student who worked part-time at TI testing the system in 82 or 83.

I am still looking into the TI 9900 and 99/4A. Some disturbing things are:

  • Incrementing and decrementing affects the carry flag unlike the behavior of most other processors. This wreaks havoc with multi-precision arithmetic code.
  • The TI assembler is brain-damaged with regard to the precedence of arithmetic operators. Multiplication and division are the same level as addition and subtraction. At least I can parenthesize 1+(2*3) but I should not have to.

99/4a

Would this one work?