[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

Bill · November 27, 2019, 1:19am

Appendix A of this manual goes into excruciating detail what goes into setting the several conditional code flags.

http://bitsavers.informatik.uni-stuttgart.de/pdf/ti/990/945250-9701_990_Computer_Family_Systems_Handbook_3ed_May76.pdf

Appendix B does the same for calculating the cycle times for instructions. This is for the 990/4, a TI industrial microcomputer based on the 9900 CPU. It is not directly applicable to the 99/4A with its various cycle eaters.

So about how many instructions does a 99/4A execute in a second? Something on the order of 100,000 to 200,000 depending on the instructions?

Bill · November 28, 2019, 1:45am

I am somewhat surprised nobody commented about this.

The discussion has evolved to what is the quickest way to arrive at the answer. The current speed leader is the hybrid compiled/interpreted environment of FORTH.

The programming problem does several time-consuming things:

Multiply large numbers
Convert large numbers to an ASCII string
Find one string within another

These capabilities are all built into the Python language.

In the following solution, Python code serves a glue between various bits of code in the run-time library. The loop only executes 175 times.

Because needed features are built-in, it is fair game to write those in assembly language as part of the run-time library. Implementation in other languages requires some to much of it in compiled high-level code.

I believe that a Python compiler has a good chance to be the fastest implementation.

Kentamanos · November 28, 2019, 2:33am

I wonder if due to those 2 requirements if some sort of BCD library would work well here.

Back in my pre-teen BASIC days, I used to do a lot of “string math” stuff which is kinda similar. I used to run a program on my Osborne to calculate the digits of the square root of 3 by naively adding an increasing digit on the end until the first two digits were “03” (or higher early on), and then lowering that digit by one and going on to the next.

Bill · November 28, 2019, 6:06am

The problem is that a BCD library is not part of the core language definition, so I lose the blanket right to implement that in assembly language (a major advantage in a speed contest.) Python is a poor language to use for writing such a library as it will have to be very “low level.”

Even if it was, using BCD may do away with the expensive conversion to a decimal representation (a divide by 10 for each digit) but multiplying becomes much more difficult on a processor without a “decimal adjust” instruction.

Bill · November 29, 2019, 3:31am

The expense of converting a binary number is so bothersome that I did some more thinking and research.

An algorithm with a funny name, double dabble, shows promise to speed up the conversion…

I am also looking into an implementation of a BCD-based approach to the problem without using any multiplication or division, though I still want to see how well a straightforward implementation in Python compares to what the others have been doing.

Bill · November 29, 2019, 11:12pm

I got the code running. No division. Multiplication done by shifting and adding.

Number * 7 = Number * 1 + Number * 2 + Number * 4

It finishes almost instantaneously on my rather slow netbook. It was written in a simple manner to help get it working; there is opportunity to more than double its speed by combining loops.

Now to get one of my toy compilers to translate it into 9900 assembly language…

Bill · December 3, 2019, 4:11am

Three additions to the Python compiler are necessary to run the 7’s problem solution on a 6502:

Implement the syntax necessary to call a method for an object
Create a str function using code already in place for implementing print
Write the find method for strings

The first two are done and the third is halfway there.

I can already tell that converting an integer to an ASCII string representation is horribly slow; the time spent multiplying two numbers is very small in comparison. I will have to eventually replace the current repeated divide by ten approach to double dabble and see how much of an improvement that is. If it is not substantial, then obviously a BCD-based approach will be the fastest way to solve the 7’s problem.

Bill · December 4, 2019, 8:44pm

The 7’s problem now compiles to 6502 and ~~runs~~ crawls.

It consumes over 22 billion machine cycles, meaning a 1 MHz 6502 would grind on this for over six hours.

The only good news is that there is plenty of room for improvement from a double dabble implementation. Stay tuned…

Oh, and my emulator runs the program in just over 95 seconds; it cranks out a rather surprising 230 MHz speed for the virtual 6502. That is on a 1 GHz AMD C60 processor. Most of today’s machines should do much better.

Bill · December 7, 2019, 3:27am

A conversion for 16-bit integers has been implemented based on double dabble. Now to make another one for big numbers…

 05E2					  00966	WriteInteger
 05E2 86 49		      [3] 00967		stx	Dabble		; Put number in conversion buffer
 05E4 85 4A		      [3] 00968		sta	Dabble+1
 						  00969
 05E6 A9 00		      [2] 00970		lda	#0			; Zero the rest of the buffer
 05E8 85 4D		      [3] 00971		sta	Dabble+4
 05EA 85 4C		      [3] 00972		sta	Dabble+3
 05EC 85 4B		      [3] 00973		sta	Dabble+2
 						  00974
 05EE A9 10		      [2] 00975		lda	#16			; 16 bits in the number
 05F0 85 02		      [3] 00976		sta	Byt0
 						  00977
 05F2					  00978	WriteInteger1
 05F2 A0 03		      [2] 00979		ldy	#3			; Three bytes in converted BCD
 						  00980
 05F4					  00981	WriteInteger2
 05F4 B9 004A	    [4/5] 00982		lda	Dabble+1,Y	; Isolate lower nybble
 05F7 29 0F		      [2] 00983		and	#$F
 05F9 C9 05		      [2] 00984		cmp	#4+1		; If greater than 4, add 3
 05FB 90 0F (060C)  [2/4] 00985		bcc	WriteIntegerLoNotGT4
 						  00986
 05FD 18		      [2] 00987		clc
 05FE 69 03		      [2] 00988		adc	#3
 0600 85 03		      [3] 00989		sta	Byt1
 						  00990
 0602 B9 004A	    [4/5] 00991		lda	Dabble+1,Y	; Combine
 0605 29 F0		      [2] 00992		and	#$F0
 0607 05 03		      [3] 00993		ora	Byt1
 0609 99 004A	      [5] 00994		sta	Dabble+1,Y
 						  00995
 060C					  00996	WriteIntegerLoNotGT4
 060C B9 004A	    [4/5] 00997		lda	Dabble+1,Y	; Isolate upper nybble
 060F 29 F0		      [2] 00998		and	#$F0
 0611 C9 50		      [2] 00999		cmp	#$40+$10	; If greater than 4, add 3
 0613 90 0F (0624)  [2/3] 01000		bcc	WriteIntegerHiNotGT4
 						  01001
 0615 18		      [2] 01002		clc
 0616 69 30		      [2] 01003		adc	#$30
 0618 85 03		      [3] 01004		sta	Byt1
 						  01005
 061A B9 004A	    [4/5] 01006		lda	Dabble+1,Y	; Combine
 061D 29 0F		      [2] 01007		and	#$F
 061F 05 03		      [3] 01008		ora	Byt1
 0621 99 004A	      [5] 01009		sta	Dabble+1,Y
 						  01010
 0624					  01011	WriteIntegerHiNotGT4
 0624 88		      [2] 01012		dey				; More BCD digits to shift?
 0625 D0 CD (05F4)  [2/4] 01013		bne	WriteInteger2
 						  01014
 0627 06 49		      [5] 01015		asl	Dabble		; Shift the buffer left one bit
 0629 26 4A		      [5] 01016		rol	Dabble+1
 						  01017
 062B 26 4D		      [5] 01018		rol	Dabble+4
 062D 26 4C		      [5] 01019		rol	Dabble+3
 062F 26 4B		      [5] 01020		rol	Dabble+2
 						  01021
 0631 C6 02		      [5] 01022		dec	Byt0		; More bits to shift?
 0633 D0 BD (05F2)  [2/4] 01023		bne	WriteInteger1
 						  01024
 0635 A0 00		      [2] 01025		ldy	#0			; Index converted BCD
 0637 A2 00		      [2] 01026		ldx	#0			; And the output string
 						  01027
 0639					  01028	WriteInteger3
 0639 B9 004B	    [4/5] 01029		lda	Dabble+2,Y	; Isolate upper digit
 063C 29 F0		      [2] 01030		and	#$F0
 063E D0 04 (0644)  [2/3] 01031		bne	WriteIntegerEmitHi
 						  01032
 0640 E0 00		      [2] 01033		cpx	#0			; Leading zero?
 0642 F0 0A (064E)  [2/3] 01034		beq	WriteIntegerSkipHi
 						  01035
 0644					  01036	WriteIntegerEmitHi
 0644 4A		      [2] 01037		lsr	A			; Shift into lower nybble
 0645 4A		      [2] 01038		lsr	A
 0646 4A		      [2] 01039		lsr	A
 0647 4A		      [2] 01040		lsr	A
 						  01041
 0648 18		      [2] 01042		clc				; Convert to ASCII numeral
 0649 69 30		      [2] 01043		adc	#'0'
 064B E8		      [2] 01044		inx
 064C 95 51		      [4] 01045		sta	Out,X
 						  01046
 064E					  01047	WriteIntegerSkipHi
 064E B9 004B	    [4/5] 01048		lda	Dabble+2,Y	; Isolate lower digit
 0651 29 0F		      [2] 01049		and	#$F
 0653 D0 04 (0659)  [2/3] 01050		bne	WriteIntegerEmitLo
 						  01051
 0655 E0 00		      [2] 01052		cpx	#0			; Leading zero?
 0657 F0 06 (065F)  [2/3] 01053		beq	WriteIntegerSkipLo
 						  01054
 0659					  01055	WriteIntegerEmitLo
 0659 18		      [2] 01056		clc				; Convert to ASCII numeral
 065A 69 30		      [2] 01057		adc	#'0'
 065C E8		      [2] 01058		inx
 065D 95 51		      [4] 01059		sta	Out,X
 						  01060
 065F					  01061	WriteIntegerSkipLo
 065F C8		      [2] 01062		iny				; Address next pair of digits
 						  01063
 0660 C0 03		      [2] 01064		cpy	#3			; More digits?
 0662 90 D5 (0639)  [2/3] 01065		bcc	WriteInteger3
 						  01066
 0664 86 51		      [3] 01067		stx	Out			; Store length of result
 0666 8A		      [2] 01068		txa
 0667 D0 08 (0671)  [2/3] 01069		bne	WriteIntegerWrite	; Check for all 0's
 						  01070
 0669 A9 01		      [2] 01071		lda	#1			; Default to "0"
 066B 85 51		      [3] 01072		sta	Out
 066D A9 30		      [2] 01073		lda	#'0'
 066F 85 52		      [3] 01074		sta	Out+1
 						  01075
 0671					  01076	WriteIntegerWrite
 0671 A2 51		      [2] 01077		ldx	#Out&$FF	; Point to the result
 0673 A9 00		      [2] 01078		lda	#Out>>8
 						  01079
 						  01080	; Fall through to WriteString

Bill · December 8, 2019, 6:12pm

The above code uses separate buffers for the conversions to packed BCD, then to text.

When dealing with large numbers, it is beneficial to reduce memory usage wherever possible. By placing the packed BCD result from the double dabble into the end of the text result buffer, one buffer can be used for both…

Dabble2BCD0
Dabble2BCD1
Dabble2BCD2
Dabble2BCD3

Bill · December 19, 2019, 2:07am

TI books… I got your TI books right here…

https://archive.org/details/tibooks

Bill · April 15, 2020, 2:31am

I will eventually get back to the bignum double dabble implementation on the 6502. I promise.

But in the meantime, my latest rabbit hole:

I discovered this several months ago:

https://www.corshamtech.com/ss-50-65c02-board-experiment/

So I contacted Bob Applegate and told him that I have been working on a debugger for the 6502 and will convert it into a monitor.

We were talking about operating systems. Most of the popular ones on the 6502 are wrapped tightly around its host hardware. None of them were anything like a CP/M or MS-DOS which can be readily adapted to a new system.

I brought up the OSI DOS. He said that he had looked at it and thought it was “clunky.” I had to agree because the user had to manually allocate disk tracks.

There is DOS/65, a CP/M clone, but there is not much information available on how to adapt it and build a boot disk.

While working on adding support for his SD card “disk system” in my debugger/monitor, I had a thought. Why not build a clone of FLEX for the 6502?

I started working on it and have most of the command line interface of the DOS portion working and can boot it off of a virtual disk in my emulator. Next up are disk drivers then the file system.

This morning, I reached out to @dave , local Ohio Scientific expert and owner of http://www.osiweb.org/ , for his input. We have been having some interesting discussions.

More to come…

Raymond · April 16, 2020, 2:38pm

Nice! I could build a 6502 card for my SWTPC system.

Bill · April 16, 2020, 3:30pm

What kind of disk drives and controller do you have on it? Are you running FLEX now?

Raymond · April 16, 2020, 5:02pm

No disk controller. I have the SD card setup.

I have yet to power it up. I finally got my variac to slowly bring the caps up to try and reform them rather than just cold power them on after many years of sitting.

Bill · April 17, 2020, 6:28pm

Is that the Corsham unit or something else?

Raymond · April 17, 2020, 8:23pm

Corsham unit.

Bill · April 27, 2020, 3:12pm

At long last, I am finally able to load and run a program from a binary file. A surprising amount of the file system had to be implemented in order to be able to do that. I am very thankful to be developing this using emulation. Using real hardware will have taken much, much longer.

It is often said that code on a 6502 tends to be faster but bigger.

The following subroutine in the FLEX file system copies an 8.3 file name to a temporary save area in a file control block.

The original 6809 code:

 D540 BE D413		      [6] 00177	MOVNAM   LDX    CURFCB    MOVE FILE NAME TO TEMP
 D543 C6 0B			      [2] 00178	         LDB    #$0B
 D545 A6 04			      [5] 00179	MOVNM1   LDA    $04,X
 D547 A7 88 24		      [5] 00180	         STA    $24,X
 D54A 30 01			      [5] 00181	         LEAX   $01,X
 D54C 5A			      [2] 00182	         DECB
 D54D 26 F6 (D545)	      [3] 00183	         BNE    MOVNM1
 D54F 39			      [5] 00184	         RTS

The 6502 code:

.C57A					  01901	MOVNAM
.C57A 18		      [2] 01902	         clc              ; Copy file name to temp
.C57B A5 04		      [3] 01903	         lda    FCBPtr
.C57D 69 20		      [3] 01904	         adc    #FCBWrk-FCBNam   ; Offset to temp area
.C57F 85 06		      [3] 01905	         sta    FMSPtr
.C581 A5 05		      [3] 01906	         lda    FCBPtr+1
.C583 69 00		      [2] 01907	         adc    #0
.C585 85 07		      [3] 01908	         sta    FMSPtr+1
.						  01909
.C587 A2 0B		      [2] 01910	         ldx    #8+3
.C589 A0 04		      [2] 01911	         ldy    #FCBNam
.						  01912
.C58B B1 04		    [5/6] 01913	MOVNM1   lda    (FCBPtr),Y
.C58D 91 06		      [6] 01914	         sta    (FMSPtr),Y
.						  01915
.C58F C8		      [2] 01916	         iny
.C590 CA		      [2] 01917	         dex
.C591 D0 F8 (C58B)  [2/3] 01918	         bne    MOVNM1
.						  01919
.C593 60		      [6] 01920	         rts

In this example, the 6502 code is 20 cycles faster, but 10 bytes bigger.

Bill · May 1, 2020, 10:35pm

I can now view a catalog of files on a disk, type out the contents of text files, delete and rename files.

Overall, I am very pleased with the progress of this project; it was started on April 6, less than a month ago.

Next up, testing all of the ways to write files. That should keep me busy over the weekend.

While working on file deletion, it became obvious that a very safe and reliable file undelete utility can be built. Should I decide to develop that, I’ll do it first on the 6502, then migrate it back to the 6800 and 6809.

Bill · May 7, 2020, 1:11am

This is a bit of a one month progress report.

The FLEX file system (File Management System or FMS) is feature complete except for the portions dealing with the creation, reading and writing of random access files. FLEX stores data on disk as a linked list of sectors. Reading or writing an arbitrary location in a file is not a simple arithmetic calculation; a naive implementation would have to read every sector of the file to follow the list until the needed sector is found. To solve this problem, creating a random file adds two special sectors which contain the addresses of the sectors in the file as they are allocated. Reading a byte from random location in a file requires first reading this sector map then the sector.

The FLEX user interface is feature complete except for enhanced error reporting and background printing. When a random file named ERRORS.SYS is present, its contents is used to present a meaningful error message instead of an error code number. This capability is dependent upon FMS random access file support. Background printing is an option which requires a source of periodic interrupts. I have not investigated this feature very much yet; there are hooks for it in various parts of the system.

The third leg of the FLEX system triad is the Utility Command Set, a collection of small programs on disk which provide a large majority of the user commands. The following commands have been implemented:

ASN
BUILD
CAT
DATE
DELETE
JUMP
LINK
LIST
RENAME
TTYSET
VERIFY
VERSION

leaving these yet to be implemented:

APPEND
COPY
EXEC
I
NEWDISK
O
P
PRINT
QCHECK
SAVE
XOUT

The bootstrap loaders work so the operating system can be loaded from a disk image.

The system total so far is about 9K bytes of machine code. That is an average of around 300 bytes per day. It may not seem like much, but remember that it is all written in assembly language.

The current memory map is:

$0000…$00FF - Zero page, locations at $12 and above are free for application programs to use
$0100…$017F - FLEX line buffer
$0180…$01FF - Stack
$0200…$033F - FLEX entry points, variables and printer driver
$0340…$047F - System File Control Block
$0480…$0AFF - FLEX Utility Command Space
$0B00…MEMEND - User memory

Somewhere above that is about 6K of FLEX itself.

Unlike the 680x versions, the UCS area has been intentionally placed adjacent to free RAM so that a program needing maximum memory can easily use both.

The User’s Guide is almost ready to publish; this is slightly modified from an original FLEX manual. The Programming Guide is about halfway done; it is a heavily modified version of an existing FLEX manual.

Finally, some preliminary work has been done on an editor and an assembler. The assembler subproject is actually comprised of four smaller projects:

enhance the existing 6800 assembler to add features such as conditional assembly directives
convert the 6800 assembler into a 6502 cross assembler running on the 6800
convert the 6800 assembler into a 6800 cross assembler running on the 6502
combine the three into a 6502 native assembler