[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

Bill · November 19, 2019, 5:28pm

Another question for 9900 assembly language experts…

The flags in the status register of the 9900 appear to be wildly different from other processors.

The L> flag supposedly indicates when an unsigned quantity is greater than zero.

The A> flag supposedly indicates when a signed quantity is greater than zero.

From page 3-29 of

http://www.bitsavers.org/pdf/ti/990/tx990/0943441-9701_990_Assembly_Language_Programming_Guide_Oct78.pdf

The JL (jump if lower) instruction branches when L> is clear and EQ is clear. That makes sense.

The JHE (jump if higher or equal) instruction branches when L> is set or EQ is set. That makes sense.

The JH (jump if higher) instruction branches when L> is set and EQ is clear. The JLE (jump if lower or equal) instruction branches when L> is clear or EQ is set. These two seem to imply that the L> flag is actually set for L>=

Anybody know about this one?

Bill · November 20, 2019, 12:06am

An interesting read…

Bill · November 24, 2019, 5:05pm

The pseudo-random number generator for the 9900 is finally working properly.

The algorithm is for 8-bit operations and implementation was tricky on the 9900 which does not provide a full set of 8-bit instructions. Stray bits kept sneaking in through the “unused” halves of 16-bit registers.

Bill · November 26, 2019, 2:06am

The documentation seems to imply that L> and EQ are mutually exclusive: that no value is both greater than 0 and equal to 0.

However, there are a couple of instructions, compare ones corresponding (COC) and compare zeros corresponding (CZC) which affect only the EQ flag. There may be a programming trick lurking in there somewhere…

Bill · November 27, 2019, 1:19am

Appendix A of this manual goes into excruciating detail what goes into setting the several conditional code flags.

http://bitsavers.informatik.uni-stuttgart.de/pdf/ti/990/945250-9701_990_Computer_Family_Systems_Handbook_3ed_May76.pdf

Appendix B does the same for calculating the cycle times for instructions. This is for the 990/4, a TI industrial microcomputer based on the 9900 CPU. It is not directly applicable to the 99/4A with its various cycle eaters.

So about how many instructions does a 99/4A execute in a second? Something on the order of 100,000 to 200,000 depending on the instructions?

Bill · November 28, 2019, 1:45am

I am somewhat surprised nobody commented about this.

The discussion has evolved to what is the quickest way to arrive at the answer. The current speed leader is the hybrid compiled/interpreted environment of FORTH.

The programming problem does several time-consuming things:

Multiply large numbers
Convert large numbers to an ASCII string
Find one string within another

These capabilities are all built into the Python language.

In the following solution, Python code serves a glue between various bits of code in the run-time library. The loop only executes 175 times.

Because needed features are built-in, it is fair game to write those in assembly language as part of the run-time library. Implementation in other languages requires some to much of it in compiled high-level code.

I believe that a Python compiler has a good chance to be the fastest implementation.

Kentamanos · November 28, 2019, 2:33am

I wonder if due to those 2 requirements if some sort of BCD library would work well here.

Back in my pre-teen BASIC days, I used to do a lot of “string math” stuff which is kinda similar. I used to run a program on my Osborne to calculate the digits of the square root of 3 by naively adding an increasing digit on the end until the first two digits were “03” (or higher early on), and then lowering that digit by one and going on to the next.

Bill · November 28, 2019, 6:06am

The problem is that a BCD library is not part of the core language definition, so I lose the blanket right to implement that in assembly language (a major advantage in a speed contest.) Python is a poor language to use for writing such a library as it will have to be very “low level.”

Even if it was, using BCD may do away with the expensive conversion to a decimal representation (a divide by 10 for each digit) but multiplying becomes much more difficult on a processor without a “decimal adjust” instruction.

Bill · November 29, 2019, 3:31am

The expense of converting a binary number is so bothersome that I did some more thinking and research.

An algorithm with a funny name, double dabble, shows promise to speed up the conversion…

I am also looking into an implementation of a BCD-based approach to the problem without using any multiplication or division, though I still want to see how well a straightforward implementation in Python compares to what the others have been doing.

Bill · November 29, 2019, 11:12pm

I got the code running. No division. Multiplication done by shifting and adding.

Number * 7 = Number * 1 + Number * 2 + Number * 4

It finishes almost instantaneously on my rather slow netbook. It was written in a simple manner to help get it working; there is opportunity to more than double its speed by combining loops.

Now to get one of my toy compilers to translate it into 9900 assembly language…

Bill · December 3, 2019, 4:11am

Three additions to the Python compiler are necessary to run the 7’s problem solution on a 6502:

Implement the syntax necessary to call a method for an object
Create a str function using code already in place for implementing print
Write the find method for strings

The first two are done and the third is halfway there.

I can already tell that converting an integer to an ASCII string representation is horribly slow; the time spent multiplying two numbers is very small in comparison. I will have to eventually replace the current repeated divide by ten approach to double dabble and see how much of an improvement that is. If it is not substantial, then obviously a BCD-based approach will be the fastest way to solve the 7’s problem.

Bill · December 4, 2019, 8:44pm

The 7’s problem now compiles to 6502 and ~~runs~~ crawls.

It consumes over 22 billion machine cycles, meaning a 1 MHz 6502 would grind on this for over six hours.

The only good news is that there is plenty of room for improvement from a double dabble implementation. Stay tuned…

Oh, and my emulator runs the program in just over 95 seconds; it cranks out a rather surprising 230 MHz speed for the virtual 6502. That is on a 1 GHz AMD C60 processor. Most of today’s machines should do much better.

Bill · December 7, 2019, 3:27am

A conversion for 16-bit integers has been implemented based on double dabble. Now to make another one for big numbers…

 05E2					  00966	WriteInteger
 05E2 86 49		      [3] 00967		stx	Dabble		; Put number in conversion buffer
 05E4 85 4A		      [3] 00968		sta	Dabble+1
 						  00969
 05E6 A9 00		      [2] 00970		lda	#0			; Zero the rest of the buffer
 05E8 85 4D		      [3] 00971		sta	Dabble+4
 05EA 85 4C		      [3] 00972		sta	Dabble+3
 05EC 85 4B		      [3] 00973		sta	Dabble+2
 						  00974
 05EE A9 10		      [2] 00975		lda	#16			; 16 bits in the number
 05F0 85 02		      [3] 00976		sta	Byt0
 						  00977
 05F2					  00978	WriteInteger1
 05F2 A0 03		      [2] 00979		ldy	#3			; Three bytes in converted BCD
 						  00980
 05F4					  00981	WriteInteger2
 05F4 B9 004A	    [4/5] 00982		lda	Dabble+1,Y	; Isolate lower nybble
 05F7 29 0F		      [2] 00983		and	#$F
 05F9 C9 05		      [2] 00984		cmp	#4+1		; If greater than 4, add 3
 05FB 90 0F (060C)  [2/4] 00985		bcc	WriteIntegerLoNotGT4
 						  00986
 05FD 18		      [2] 00987		clc
 05FE 69 03		      [2] 00988		adc	#3
 0600 85 03		      [3] 00989		sta	Byt1
 						  00990
 0602 B9 004A	    [4/5] 00991		lda	Dabble+1,Y	; Combine
 0605 29 F0		      [2] 00992		and	#$F0
 0607 05 03		      [3] 00993		ora	Byt1
 0609 99 004A	      [5] 00994		sta	Dabble+1,Y
 						  00995
 060C					  00996	WriteIntegerLoNotGT4
 060C B9 004A	    [4/5] 00997		lda	Dabble+1,Y	; Isolate upper nybble
 060F 29 F0		      [2] 00998		and	#$F0
 0611 C9 50		      [2] 00999		cmp	#$40+$10	; If greater than 4, add 3
 0613 90 0F (0624)  [2/3] 01000		bcc	WriteIntegerHiNotGT4
 						  01001
 0615 18		      [2] 01002		clc
 0616 69 30		      [2] 01003		adc	#$30
 0618 85 03		      [3] 01004		sta	Byt1
 						  01005
 061A B9 004A	    [4/5] 01006		lda	Dabble+1,Y	; Combine
 061D 29 0F		      [2] 01007		and	#$F
 061F 05 03		      [3] 01008		ora	Byt1
 0621 99 004A	      [5] 01009		sta	Dabble+1,Y
 						  01010
 0624					  01011	WriteIntegerHiNotGT4
 0624 88		      [2] 01012		dey				; More BCD digits to shift?
 0625 D0 CD (05F4)  [2/4] 01013		bne	WriteInteger2
 						  01014
 0627 06 49		      [5] 01015		asl	Dabble		; Shift the buffer left one bit
 0629 26 4A		      [5] 01016		rol	Dabble+1
 						  01017
 062B 26 4D		      [5] 01018		rol	Dabble+4
 062D 26 4C		      [5] 01019		rol	Dabble+3
 062F 26 4B		      [5] 01020		rol	Dabble+2
 						  01021
 0631 C6 02		      [5] 01022		dec	Byt0		; More bits to shift?
 0633 D0 BD (05F2)  [2/4] 01023		bne	WriteInteger1
 						  01024
 0635 A0 00		      [2] 01025		ldy	#0			; Index converted BCD
 0637 A2 00		      [2] 01026		ldx	#0			; And the output string
 						  01027
 0639					  01028	WriteInteger3
 0639 B9 004B	    [4/5] 01029		lda	Dabble+2,Y	; Isolate upper digit
 063C 29 F0		      [2] 01030		and	#$F0
 063E D0 04 (0644)  [2/3] 01031		bne	WriteIntegerEmitHi
 						  01032
 0640 E0 00		      [2] 01033		cpx	#0			; Leading zero?
 0642 F0 0A (064E)  [2/3] 01034		beq	WriteIntegerSkipHi
 						  01035
 0644					  01036	WriteIntegerEmitHi
 0644 4A		      [2] 01037		lsr	A			; Shift into lower nybble
 0645 4A		      [2] 01038		lsr	A
 0646 4A		      [2] 01039		lsr	A
 0647 4A		      [2] 01040		lsr	A
 						  01041
 0648 18		      [2] 01042		clc				; Convert to ASCII numeral
 0649 69 30		      [2] 01043		adc	#'0'
 064B E8		      [2] 01044		inx
 064C 95 51		      [4] 01045		sta	Out,X
 						  01046
 064E					  01047	WriteIntegerSkipHi
 064E B9 004B	    [4/5] 01048		lda	Dabble+2,Y	; Isolate lower digit
 0651 29 0F		      [2] 01049		and	#$F
 0653 D0 04 (0659)  [2/3] 01050		bne	WriteIntegerEmitLo
 						  01051
 0655 E0 00		      [2] 01052		cpx	#0			; Leading zero?
 0657 F0 06 (065F)  [2/3] 01053		beq	WriteIntegerSkipLo
 						  01054
 0659					  01055	WriteIntegerEmitLo
 0659 18		      [2] 01056		clc				; Convert to ASCII numeral
 065A 69 30		      [2] 01057		adc	#'0'
 065C E8		      [2] 01058		inx
 065D 95 51		      [4] 01059		sta	Out,X
 						  01060
 065F					  01061	WriteIntegerSkipLo
 065F C8		      [2] 01062		iny				; Address next pair of digits
 						  01063
 0660 C0 03		      [2] 01064		cpy	#3			; More digits?
 0662 90 D5 (0639)  [2/3] 01065		bcc	WriteInteger3
 						  01066
 0664 86 51		      [3] 01067		stx	Out			; Store length of result
 0666 8A		      [2] 01068		txa
 0667 D0 08 (0671)  [2/3] 01069		bne	WriteIntegerWrite	; Check for all 0's
 						  01070
 0669 A9 01		      [2] 01071		lda	#1			; Default to "0"
 066B 85 51		      [3] 01072		sta	Out
 066D A9 30		      [2] 01073		lda	#'0'
 066F 85 52		      [3] 01074		sta	Out+1
 						  01075
 0671					  01076	WriteIntegerWrite
 0671 A2 51		      [2] 01077		ldx	#Out&$FF	; Point to the result
 0673 A9 00		      [2] 01078		lda	#Out>>8
 						  01079
 						  01080	; Fall through to WriteString

Bill · December 8, 2019, 6:12pm

The above code uses separate buffers for the conversions to packed BCD, then to text.

When dealing with large numbers, it is beneficial to reduce memory usage wherever possible. By placing the packed BCD result from the double dabble into the end of the text result buffer, one buffer can be used for both…

Dabble2BCD0
Dabble2BCD1
Dabble2BCD2
Dabble2BCD3

Bill · December 19, 2019, 2:07am

TI books… I got your TI books right here…

https://archive.org/details/tibooks

Bill · April 15, 2020, 2:31am

I will eventually get back to the bignum double dabble implementation on the 6502. I promise.

But in the meantime, my latest rabbit hole:

I discovered this several months ago:

https://www.corshamtech.com/ss-50-65c02-board-experiment/

So I contacted Bob Applegate and told him that I have been working on a debugger for the 6502 and will convert it into a monitor.

We were talking about operating systems. Most of the popular ones on the 6502 are wrapped tightly around its host hardware. None of them were anything like a CP/M or MS-DOS which can be readily adapted to a new system.

I brought up the OSI DOS. He said that he had looked at it and thought it was “clunky.” I had to agree because the user had to manually allocate disk tracks.

There is DOS/65, a CP/M clone, but there is not much information available on how to adapt it and build a boot disk.

While working on adding support for his SD card “disk system” in my debugger/monitor, I had a thought. Why not build a clone of FLEX for the 6502?

I started working on it and have most of the command line interface of the DOS portion working and can boot it off of a virtual disk in my emulator. Next up are disk drivers then the file system.

This morning, I reached out to @dave , local Ohio Scientific expert and owner of http://www.osiweb.org/ , for his input. We have been having some interesting discussions.

More to come…

Raymond · April 16, 2020, 2:38pm

Nice! I could build a 6502 card for my SWTPC system.

Bill · April 16, 2020, 3:30pm

What kind of disk drives and controller do you have on it? Are you running FLEX now?

Raymond · April 16, 2020, 5:02pm

No disk controller. I have the SD card setup.

I have yet to power it up. I finally got my variac to slowly bring the caps up to try and reform them rather than just cold power them on after many years of sitting.

Bill · April 17, 2020, 6:28pm

Is that the Corsham unit or something else?