[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

Many of these chips have as much or more then the 70’s and 80’s general purpose computers. Some as much as 256 kbytes of RAM.

Anyway, back to the Python compiler. I go from hard to very hard. The print function is done except for handling the keyword arguments.

In Python, variables are just a reference to an object. Essentially nothing but a pointer. A variable can reference an object of any type; a boolean, an integer, a real number, a string or even a function. The run-time code must determine the type and do something reasonable with it.

The bad part of it, as least for the compiler implementer, is that nothing is known at compile time about the number and types of the arguments to a function. Adding to the complexity is that you can have traditional positional arguments, and then some keyword arguments and they can be assigned a default value if the function call does not specify one.

Languages like Pascal have been criticized because parts of the language are “special.” Write and writeln cannot be replaced or extended without modifying the compiler. Python, at least Python 3, does not have that problem. I can essentially write:

def myPrint(*pargs, **kargs):
        print(*pargs, **kargs)

and alter the way the print function works. I can even do

        oldPrint = print
        print = myPrint

and replace the provided one with mine.

But that power comes with a heavy price. Especially with Python’s dynamic typing. The burden is on the run-time code to figure out how to map the arguments when a function is called.

The AVR does not. I do not know whether the larger MSP430s do. They are also Harvard architecture, making them less suitable for general purpose use loading arbitrary application code from a storage device. The ARM is the notable exception to these limitations and why it is so popular today.

I have an amazing amount of general-purpose code running on a Harvard architecture processor, the MAXQ20. Including a Tcl interpreter and the uIP stack talking to the outside world. :slight_smile:

General purpose just means that the sequence of operations is not wired into the hardware. Having split code and data paths does not violate this definition.

(edit: As for loading arbitrary code from external storage, this is no different between von Neumann and Harvard architectures. What really bites you is that small micros usually have a ROM-like code storage, such as flash or EEPROM. If your micro has an EMIF interface, there’s no reason you can’t stick some SRAM outside for code execution and load it however your heart desires.)

Why spend all that CPU copying memory? Stick a MC6844 DMA controller on the outside of that baby!

Your correct, I just looked at the data sheet and they seem to max out at 16K of RAM, which is still more then enough to run a general purpose computer, and with some of the larger pin devices it wouldn’t be hard to add more external RAM that could be paged in/out of the chip.

It certainly wouldn’t be a viable design, but it is doable, and would be an interesting exercise in retro inspired computer design. If UNIX could be developed on a PDP/7, it would certainly be possible to create a version for the AVR…

I remembered this

So it would be possible to implement an AVR on an FPGA with as many resources as one wants… Can’t you see it, and AVR with a PDP/11 front panel running UNIX and supporting the native debugging of Arduino and Assembly code?

Yep and it is possible to programmatically change your flash on the AVR with a running program. So a simple monitor like program could load from and SD, or through the serial port and place the code in flash and allow you to run it. Haven’t looked at AVR assembly and don’t remember if there is any where for the code to single step…

There is a way for the code to write to flash memory since the bootloader does exactly that. I never learned to do that, but it was something I will eventually have to do to really finish my FORTH implementation for the AVR.

I do not remember there being a single step bit or interrupt like the x86. You may have to add some external hardware to yank an NMI to do that.

I originally wrote this on the x86 16-bit following the roadmap in the Byte book on Threaded Interpretive Languages:

    656				     ;-----------------------------------------------------------------------------
    657				     ;
    658				     ; Inner interpreter implementation
    659				     ;
    660	00E5			     CODE    ends
    661	0000			     DICT    segment
    662
    663				     ;
    664				     ; SEMI does not have a header, but	has a standard word address
    665				     ;
    666	0000  00E5r		     _SEMI   dw	     offset SEMI
Turbo Assembler	 Version 3.2	    05/06/15 15:45:56	    Page 10
tilli.ASM
TILLI.ASM - Threaded Interpretive Language Little Implementation


    667
    668	0002			     DICT    ends
    669	00E5			     CODE    segment
    670
    671	00E5			     SEMI:
    672				     public  SEMI
    673	00E5  8B 76 00			     mov     SI,[BP]		     ; Pop return address
    674	00E8  83 C5 02			     add     BP,2
    675
    676	00EB			     NEXT:
    677				     public  NEXT
    678	00EB  8B 3C			     mov     DI,[SI]		     ; Get next	word address
    679	00ED  83 C6 02			     add     SI,2
    680
    681	00F0			     RUN:
    682				     public  RUN
    683	00F0  8B 1D			     mov     BX,[DI]		     ; Run a threaded word
    684	00F2  83 C7 02			     add     DI,2
    685	00F5  FF E3			     jmp     BX
    686
    687	00F7			     __COLON:
    688				     public  __COLON
    689	00F7  83 ED 02			     sub     BP,2		     ; Push instruction	register
    690	00FA  89 76 00			     mov     [BP],SI
    691	00FD  8B F7			     mov     SI,DI		     ; Point to	nested secondary
    692	00FF  EB EA			     jmp     short NEXT
    693
    694
    695				     ;-----------------------------------------------------------------------------
    696				     ;
    697				     ; EXECUTE ( addr -- )
    698				     ;
    699				     ; Execute dictionary entry	at compilation address on stack; for example,
    700				     ; address returned	by FIND.
    701				     ;
    702	0101			     CODE    ends
    703	0002			     DICT    segment
    704
    705					     header  <'EXECUTE'>
1   706	0002  0000			     dw	     PREV_ENTRY
1   707	0004  07			     db	     offset ??0007 - offset $ -	1    ; the 'EXECUTE' length
1   708	0005  45 58 45 43 55 54	45	     db	     'EXECUTE'
1   709	000C			     ??0007  label   byte
    710
    711	000C  0101r		     _EXECUTE	     dw	     offset __EXECUTE
    712
    713	000E			     DICT    ends
    714	0101			     CODE    segment
    715
    716	0101			     __EXECUTE:
    717	0101  5F			     pop     DI			     ; Get word	address	of the word
    718	0102  EB EC			     jmp     Run

From there, I ported it to the 6800;

 			  00532	;=== < Inner interpreter >====================================================
 			  00533
 			  00534	;-----------------------------------------------------------------------------
 			  00535	;
 			  00536	; SEMI does not have a header, but has a standard word address
 			  00537	;
 016E 0170		  00538	_SEMI    fdb    SEMI
 			  00539
 0170			  00540	SEMI:
 0170 DE 00	      [4] 00541	         ldx    RS        ; Pop IR from return stack
 0172 08	      [4] 00542	         inx
 0173 08	      [4] 00543	         inx
 0174 DF 00	      [5] 00544	         stx    RS
 0176 EE 00	      [6] 00545	         ldx    ,X
 0178 DF 02	      [5] 00546	         stx    IR
 			  00547	;	mov	SI,[BP]			; Pop return address
 			  00548	;	add	BP,2
 			  00549
 017A			  00550	Next:
 017A DE 02	      [4] 00551	         ldx    IR
 017C 08	      [4] 00552	Next1    inx
 017D 08	      [4] 00553	         inx
 017E DF 02	      [5] 00554	         stx    IR
 0180 EE 00	      [6] 00555	         ldx    ,X
 			  00556	;	mov	DI,[SI]			; Get next word address
 			  00557	;	add	SI,2
 			  00558
 0182			  00559	RUN:                      ; WA in X
 0182 DF 04	      [5] 00560	         stx    WA        ; Run machine code of new word
DEI Research 6800 Cross Assembler  Version 0.0   05-16-2015 112:47:26  Page 11
tilli.a68

 Addr Code	   Cycles Line#	  Source Statement

 0184 EE 00	      [6] 00561	Run1     ldx    ,X
 0186 6E 00	      [4] 00562	         jmp    ,X
 			  00563	;	mov	BX,[DI]			; Run a threaded word
 			  00564	;	add	DI,2
 			  00565	;	jmp	BX
 			  00566
 0188			  00567	__COLON:
 0188 DE 00	      [4] 00568	         ldx    RS        ; Push instruction register
 018A D6 03	      [3] 00569	         ldab   IR+1      ; on return stack
 018C 09	      [4] 00570	         dex
 018D E7 02	      [6] 00571	         stab   2,X
 018F 96 02	      [3] 00572	         ldaa   IR
 0191 09	      [4] 00573	         dex
 0192 A7 02	      [6] 00574	         staa   2,X
 0194 DF 00	      [5] 00575	         stx    RS
 			  00576
 0196 DE 04	      [4] 00577	         ldx    WA        ; Execute new secondary
 0198 20 E2 (017C)    [4] 00578	         bra    Next1
 			  00579	;	sub	BP,2			; Push instruction register
 			  00580	;	mov	[BP],SI
 			  00581	;	mov	SI,DI			; Point to nested secondary
 			  00582	;	jmp	short NEXT

And finally the AVR:

 			  00572	;=== < Inner interpreter >====================================================
 			  00573
 			  00574	;-----------------------------------------------------------------------------
 			  00575	;
 			  00576	; SEMI does not have a header, but has a standard word address
 			  00577	;
 			  00578	.init
 0146 0102		  00579	?SEMI:	.dw	SEMI
 			  00580	.cseg
 			  00581
 000102 01F7	      [1] 00582	SEMI:	movw	R30,R14			; Pop return address
 000103 91A1	      [2] 00583		ld	R26,Z+
 000104 91B1	      [2] 00584		ld	R27,Z+
 000105 017F	      [1] 00585		movw	R14,R30
 			  00586	;	mov	SI,[BP]
 			  00587	;	add	BP,2
 			  00588
 000106 91CD	      [2] 00589	Next:	ld	R28,X+			; Get next word address in secondary
 000107 91DD	      [2] 00590		ld	R29,X+
 			  00591	;	mov	DI,[SI]
 			  00592	;	add	SI,2
 			  00593
 000108 91E9	      [2] 00594	Run:	ld	R30,Y+			; Run the word
 000109 91F9	      [2] 00595		ld	R31,Y+
 00010A 9409	      [2] 00596		ijmp
 			  00597	;	mov	BX,[DI]
 			  00598	;	add	DI,2
 			  00599	;	jmp	BX
 			  00600
 00010B			  00601	?COLON:
 00010B 01F7	      [1] 00602		movw	R30,R14			; Push instruction register
 00010C 93B2	      [2] 00603		st	-Z,R27
 00010D 93A2	      [2] 00604		st	-Z,R26
 00010E 017F	      [1] 00605		movw	R14,R30
 00010F 01DE	      [1] 00606		movw	R26,R28			; Point to nested secondary
 000110 CFF5=000106   [2] 00607		rjmp	Next			; And run it
 			  00608	;	sub	BP,2
 			  00609	;	mov	[BP],SI
 			  00610	;	mov	SI,DI
 			  00611	;	jmp	short Next
 			  00612
 			  00613	;-----------------------------------------------------------------------------
 			  00614	;
 			  00615	; EXECUTE ( addr -- )
 			  00616	;
 			  00617	; Execute dictionary entry at compilation address on stack; for example,
 			  00618	; address returned by FIND.
 			  00619	;
 			  00620	.init
 			  00621	header	"EXECUTE",0
+         =00000148		 .set	THIS_ENTRY	= PC
+0148 0138			 .dw	PREV_ENTRY
+         =00000148		 .set	PREV_ENTRY	= THIS_ENTRY
+014A 0745584543555445		 .db	0 | strlen("EXECUTE"),"EXECUTE"
DEI Research AVR Cross Assembler  Version 0.0   Jun-23-2015 57:32:32  Page 12
tilli.ASM

  Addr   Code	   Cycles Line#	  Source Statement

 0152 0111		  00622	_EXECUTE:	.dw	__EXECUTE
 			  00623	.cseg
 			  00624
 000111			  00625	__EXECUTE:
 000111 91CF	      [2] 00626		pop	R28			; Get word address of the new word
 000112 91DF	      [2] 00627		pop	R29
 000113 CFF4=000108   [2] 00628		rjmp	Run
 			  00629	;	pop	DI
 			  00630	;	jmp	Run

This project had been set aside while I did other things until a friend asked about it, giving me a kick in the butt to work on it again.

The lexical analysis handling of indentation is finally done. If you know Python at all, this is a key part of the language.

Next up is implementing the symbol table and literal pool, along with a simplistic expression parser. Once those are done, I will be able to compile simple Python programs for my 6502 virtual machine.

Other things to do in the near-term:

  • Modify the run-time library code and the assembler to generate code for the Commodore 128.
  • Python integers grow to fit the data. The run-time library can handle this; the compiler currently cannot.
  • Implement more language features.
  • Modify the assembler to add an option to be case-sensitive.
  • Python does not set a limit to the length of variable names. The current tools have a limit of 32 characters.
  • Python does not set a limit on the length of a line of source code. The current tools have a limit of 255 characters.
  • Port back end to support another platform in addition to the 6502. 68000? x86? AVR (Arduino?)
4 Likes

The following Python code:

a = b = c

generates this 6502 code:

 020E AE 13A7	      [4] 00093		ldx	_c
 0211 8A	      [2] 00094		txa
 0212 0D 13A8	      [4] 00095		ora	_c+1
 0215 D0 0B (0222)  [2/3] 00096		bne	L00000
 			  00097
 0217 A9 A7	      [2] 00098		lda	#_c&$FF
 0219 85 36	      [3] 00099		sta	Var
 021B A9 13	      [2] 00100		lda	#_c>>8
 021D 85 37	      [3] 00101		sta	Var+1
 021F 20 1344	      [6] 00102		jsr	Undef
 			  00103
 0222			  00104	L00000
 0222 AD 13A8	      [4] 00105		lda	_c+1
 0225 8E 139D	      [4] 00106		stx	_a
 0228 8D 139E	      [4] 00107		sta	_a+1
 022B 8E 13A2	      [4] 00108		stx	_b
 022E 8D 13A3	      [4] 00109		sta	_b+1
 			  00110
 0231 86 10	      [3] 00111		stx	Ptr0
 0233 85 11	      [3] 00112		sta	Ptr0+1
 0235 20 04FF	      [6] 00113		jsr	AddRef
 0238 20 04FF	      [6] 00114		jsr	AddRef

The first block of code checks whether the variable c is defined.

The second block of code displays an error message if not.

The third block of code assigns the reference to c to variables a and b.

The fourth block of code increments the reference count.

As I have been anticipating, dynamic typing imposes quite a bit of overhead.

1 Like

From the Things are Not as Easy as They Appear Department:

The previous code snippet had a serious flaw: previous object references by variables a and b were not decremented.

So, the following Python code:

a = b = c

generates this 6502 code:

 020E AE 14D2	      [4] 00094		ldx	_c
 0211 8A	      [2] 00095		txa
 0212 0D 14D3	      [4] 00096		ora	_c+1
 0215 D0 0B (0222)  [2/3] 00097		bne	L00000
 			  00098
 0217 A9 D2	      [2] 00099		lda	#_c&$FF
 0219 85 38	      [3] 00100		sta	Var
 021B A9 14	      [2] 00101		lda	#_c>>8
 021D 85 39	      [3] 00102		sta	Var+1
 021F 20 146F	      [6] 00103		jsr	Undef
 			  00104
 0222			  00105	L00000
 0222 AD 14D3	      [4] 00106		lda	_c+1
 0225 86 10	      [3] 00107		stx	PtrA
 0227 85 11	      [3] 00108		sta	PtrA+1
 0229 20 062A	      [6] 00109		jsr	AddRef
 022C 20 062A	      [6] 00110		jsr	AddRef
 			  00111
 022F AE 14C8	      [4] 00112		ldx	_a
 0232 AD 14C9	      [4] 00113		lda	_a+1
 0235 86 12	      [3] 00114		stx	Ptr0
 0237 85 13	      [3] 00115		sta	Ptr0+1
 0239 20 064F	      [6] 00116		jsr	DeRef
 023C AE 14CD	      [4] 00117		ldx	_b
 023F AD 14CE	      [4] 00118		lda	_b+1
 0242 86 12	      [3] 00119		stx	Ptr0
 0244 85 13	      [3] 00120		sta	Ptr0+1
 0246 20 064F	      [6] 00121		jsr	DeRef
 			  00122
 0249 A6 10	      [3] 00123		ldx	PtrA
 024B A5 11	      [3] 00124		lda	PtrA+1
 024D 8E 14C8	      [4] 00125		stx	_a
 0250 8D 14C9	      [4] 00126		sta	_a+1
 0253 8E 14CD	      [4] 00127		stx	_b
 0256 8D 14CE	      [4] 00128		sta	_b+1

The first block of code checks whether the variable c is defined.

The second block of code displays an error message if not.

The third block of code increments the reference count.

The fourth block of code decrements the reference counts for a and b referenced objects.

The fifth block of code assigns the reference to c to variables a and b.

It has been a productive weekend of hacking.

Taking a break from analyzing and designing the handling of literal constants in Python, I wanted some more experience with parsing expressions and code generation. So with a combination of a toy Pascal compiler started long ago, a BASIC compiler for the 6800 and a heavy dose of “let’s try this,” this is the code generated for some program statements:

 0200			  00001		*=	$200
 			  00002	; 00001	program Test;
 			  00003	; 00002	
 			  00004	; 00003	  var
 			  00005	; 00004	    A, B        : integer;
 			  00006	; 00005	    C, D        : integer;
 			  00007	; 00006	    E, F        : integer;
 			  00008	; 00007	
 			  00009	; 00008	  begin
 0200			  00010	TEST_
 0200 A2 FF	      [2] 00011		ldx	#$FF
 0202 9A	      [2] 00012		txs
 0203 20 0363	      [6] 00013		jsr	InitRTL
 			  00014	; 00009	    A := 0;
 0206 A9 00	      [2] 00015		lda	#0
 0208 8D 0364	      [4] 00016		sta	A_
 020B 8D 0365	      [4] 00017		sta	A_+1
 			  00018	; 00010	    B := 1;
 020E A2 01	      [2] 00019		ldx	#1
 0210 A9 00	      [2] 00020		lda	#0
 0212 8E 0366	      [4] 00021		stx	B_
 0215 8D 0367	      [4] 00022		sta	B_+1
 			  00023	; 00011	    C := 258;
 0218 A2 02	      [2] 00024		ldx	#2
 021A A9 01	      [2] 00025		lda	#1
 021C 8E 0368	      [4] 00026		stx	C_
 021F 8D 0369	      [4] 00027		sta	C_+1
 			  00028	; 00012	    D := -1;
 0222 A9 FF	      [2] 00029		lda	#255
 0224 8D 036A	      [4] 00030		sta	D_
 0227 8D 036B	      [4] 00031		sta	D_+1
 			  00032	; 00013	    E := 1 + 2;
 022A A2 03	      [2] 00033		ldx	#3
 022C A9 00	      [2] 00034		lda	#0
 022E 8E 036C	      [4] 00035		stx	E_
 0231 8D 036D	      [4] 00036		sta	E_+1
 			  00037	; 00014	    C := A;
 0234 AE 0364	      [4] 00038		ldx	A_
 0237 AD 0365	      [4] 00039		lda	A_+1
 023A 8E 0368	      [4] 00040		stx	C_
 023D 8D 0369	      [4] 00041		sta	C_+1
 			  00042	; 00015	    D := -B;
 0240 38	      [2] 00043		sec
 0241 A9 00	      [2] 00044		lda	#0
 0243 ED 0366	      [4] 00045		sbc	B_
 0246 AA	      [2] 00046		tax
 0247 A9 00	      [2] 00047		lda	#0
 0249 ED 0367	      [4] 00048		sbc	B_+1
 024C 8E 036A	      [4] 00049		stx	D_
 024F 8D 036B	      [4] 00050		sta	D_+1
 			  00051	; 00016	    E := E;
 			  00052	; 00017	    E := E + 0;
 			  00053	; 00018	    E := A + 0;
 0252 AE 0364	      [4] 00054		ldx	A_
 0255 AD 0365	      [4] 00055		lda	A_+1
 0258 8E 036C	      [4] 00056		stx	E_
 025B 8D 036D	      [4] 00057		sta	E_+1
 			  00058	; 00019	    E := A + 1;
 025E 18	      [2] 00059		clc
 025F AD 0364	      [4] 00060		lda	A_
 0262 69 01	      [2] 00061		adc	#1
 0264 AA	      [2] 00062		tax
 0265 AD 0365	      [4] 00063		lda	A_+1
 0268 69 00	      [2] 00064		adc	#0
 026A 8E 036C	      [4] 00065		stx	E_
 026D 8D 036D	      [4] 00066		sta	E_+1
 			  00067	; 00020	    E := A + 256;
 0270 18	      [2] 00068		clc
 0271 AE 0364	      [4] 00069		ldx	A_
 0274 AD 0365	      [4] 00070		lda	A_+1
 0277 69 01	      [2] 00071		adc	#1
 0279 8E 036C	      [4] 00072		stx	E_
 027C 8D 036D	      [4] 00073		sta	E_+1
 			  00074	; 00021	    E := A + 258;
 027F 18	      [2] 00075		clc
 0280 AD 0364	      [4] 00076		lda	A_
 0283 69 02	      [2] 00077		adc	#2
 0285 AA	      [2] 00078		tax
 0286 AD 0365	      [4] 00079		lda	A_+1
 0289 69 01	      [2] 00080		adc	#1
 028B 8E 036C	      [4] 00081		stx	E_
 028E 8D 036D	      [4] 00082		sta	E_+1
 			  00083	; 00022	    E := B + C;
 0291 18	      [2] 00084		clc
 0292 AD 0366	      [4] 00085		lda	B_
 0295 6D 0368	      [4] 00086		adc	C_
 0298 AA	      [2] 00087		tax
 0299 AD 0367	      [4] 00088		lda	B_+1
 029C 6D 0369	      [4] 00089		adc	C_+1
 029F 8E 036C	      [4] 00090		stx	E_
 02A2 8D 036D	      [4] 00091		sta	E_+1
 			  00092	; 00023	    E := 0 + E;
 			  00093	; 00024	    E := 0 + A;
 02A5 AE 0364	      [4] 00094		ldx	A_
 02A8 AD 0365	      [4] 00095		lda	A_+1
 02AB 8E 036C	      [4] 00096		stx	E_
 02AE 8D 036D	      [4] 00097		sta	E_+1
 			  00098	; 00025	    E := 1 + A;
 02B1 18	      [2] 00099		clc
 02B2 AD 0364	      [4] 00100		lda	A_
 02B5 69 01	      [2] 00101		adc	#1
 02B7 AA	      [2] 00102		tax
 02B8 AD 0365	      [4] 00103		lda	A_+1
 02BB 69 00	      [2] 00104		adc	#0
 02BD 8E 036C	      [4] 00105		stx	E_
 02C0 8D 036D	      [4] 00106		sta	E_+1
 			  00107	; 00026	    E := 256 + A;
 02C3 18	      [2] 00108		clc
 02C4 AE 0364	      [4] 00109		ldx	A_
 02C7 AD 0365	      [4] 00110		lda	A_+1
 02CA 69 01	      [2] 00111		adc	#1
 02CC 8E 036C	      [4] 00112		stx	E_
 02CF 8D 036D	      [4] 00113		sta	E_+1
 			  00114	; 00027	    E := 258 + A;
 02D2 18	      [2] 00115		clc
 02D3 AD 0364	      [4] 00116		lda	A_
 02D6 69 02	      [2] 00117		adc	#2
 02D8 AA	      [2] 00118		tax
 02D9 AD 0365	      [4] 00119		lda	A_+1
 02DC 69 01	      [2] 00120		adc	#1
 02DE 8E 036C	      [4] 00121		stx	E_
 02E1 8D 036D	      [4] 00122		sta	E_+1
 			  00123	; 00028	    F := -(B + C);
 02E4 18	      [2] 00124		clc
 02E5 AD 0366	      [4] 00125		lda	B_
 02E8 6D 0368	      [4] 00126		adc	C_
 02EB AA	      [2] 00127		tax
 02EC AD 0367	      [4] 00128		lda	B_+1
 02EF 6D 0369	      [4] 00129		adc	C_+1
 02F2 38	      [2] 00130		sec
 02F3 A8	      [2] 00131		tay
 02F4 8A	      [2] 00132		txa
 02F5 49 FF	      [2] 00133		eor	#$FF
 02F7 69 00	      [2] 00134		adc	#0
 02F9 AA	      [2] 00135		tax
 02FA 98	      [2] 00136		tya
 02FB 49 FF	      [2] 00137		eor	#$FF
 02FD 69 00	      [2] 00138		adc	#0
 02FF 8E 036E	      [4] 00139		stx	F_
 0302 8D 036F	      [4] 00140		sta	F_+1
 			  00141	; 00029	    F := 0 + -B;
 0305 38	      [2] 00142		sec
 0306 A9 00	      [2] 00143		lda	#0
 0308 ED 0366	      [4] 00144		sbc	B_
 030B AA	      [2] 00145		tax
 030C A9 00	      [2] 00146		lda	#0
 030E ED 0367	      [4] 00147		sbc	B_+1
 0311 8E 036E	      [4] 00148		stx	F_
 0314 8D 036F	      [4] 00149		sta	F_+1
 			  00150	; 00030	    F := 1 + -B;
 0317 38	      [2] 00151		sec
 0318 A9 00	      [2] 00152		lda	#0
 031A ED 0366	      [4] 00153		sbc	B_
 031D AA	      [2] 00154		tax
 031E A9 00	      [2] 00155		lda	#0
 0320 ED 0367	      [4] 00156		sbc	B_+1
 0323 18	      [2] 00157		clc
 0324 A8	      [2] 00158		tay
 0325 8A	      [2] 00159		txa
 0326 69 01	      [2] 00160		adc	#1
 0328 AA	      [2] 00161		tax
 0329 98	      [2] 00162		tya
 032A 69 00	      [2] 00163		adc	#0
 032C 8E 036E	      [4] 00164		stx	F_
 032F 8D 036F	      [4] 00165		sta	F_+1
 			  00166	; 00031	    F := 256 + -B;
 0332 38	      [2] 00167		sec
 0333 A9 00	      [2] 00168		lda	#0
 0335 ED 0366	      [4] 00169		sbc	B_
 0338 AA	      [2] 00170		tax
 0339 A9 00	      [2] 00171		lda	#0
 033B ED 0367	      [4] 00172		sbc	B_+1
 033E 18	      [2] 00173		clc
 033F 69 01	      [2] 00174		adc	#1
 0341 8E 036E	      [4] 00175		stx	F_
 0344 8D 036F	      [4] 00176		sta	F_+1
 			  00177	; 00032	    F := 258 + -B;
 0347 38	      [2] 00178		sec
 0348 A9 00	      [2] 00179		lda	#0
 034A ED 0366	      [4] 00180		sbc	B_
 034D AA	      [2] 00181		tax
 034E A9 00	      [2] 00182		lda	#0
 0350 ED 0367	      [4] 00183		sbc	B_+1
 0353 18	      [2] 00184		clc
 0354 A8	      [2] 00185		tay
 0355 8A	      [2] 00186		txa
 0356 69 02	      [2] 00187		adc	#2
 0358 AA	      [2] 00188		tax
 0359 98	      [2] 00189		tya
 035A 69 01	      [2] 00190		adc	#1
 035C 8E 036E	      [4] 00191		stx	F_
 035F 8D 036F	      [4] 00192		sta	F_+1
 			  00193	; 00033	  end.

The current code generator uses syntax directed translation, that is, the parser generates code as it parses. I love optimizing code and I really, really love teaching a computer to optimize code.

There are limitations to this approach. What is needed next is to build a tree representing an expression which can be transformed by an optimizer before feeding it to a code generator.

Assignment of integer and string literals is working.

 			  00092	; 00002	hello = "Hello world."
 020E A2 D0	      [2] 00093		ldx	#S_00000&$FF
 0210 A9 14	      [2] 00094		lda	#S_00000>>8
 0212 86 10	      [3] 00095		stx	PtrA
 0214 85 11	      [3] 00096		sta	PtrA+1
 			  00097
 0216 AE 1525	      [4] 00098		ldx	_hello
 0219 AD 1526	      [4] 00099		lda	_hello+1
 021C 20 057F	      [6] 00100		jsr	DeRef
 			  00101
 021F A6 10	      [3] 00102		ldx	PtrA
 0221 A5 11	      [3] 00103		lda	PtrA+1
 0223 8E 1525	      [4] 00104		stx	_hello
 0226 8D 1526	      [4] 00105		sta	_hello+1
 			  00106	; 00003	c = 1
 0229 A2 CE	      [2] 00107		ldx	#I_00001&$FF
 022B A9 14	      [2] 00108		lda	#I_00001>>8
 022D 86 10	      [3] 00109		stx	PtrA
 022F 85 11	      [3] 00110		sta	PtrA+1
 			  00111
 0231 AE 14E9	      [4] 00112		ldx	_c
 0234 AD 14EA	      [4] 00113		lda	_c+1
 0237 20 057F	      [6] 00114		jsr	DeRef
 			  00115
 023A A6 10	      [3] 00116		ldx	PtrA
 023C A5 11	      [3] 00117		lda	PtrA+1
 023E 8E 14E9	      [4] 00118		stx	_c
 0241 8D 14EA	      [4] 00119		sta	_c+1

and the literals themselves

 14CE 41		  03547	I_00001	.fcb	$41
 14CF 01		  03548		.fcb	$01
 			  03549
 14D0 81		  03550	S_00000	.fcb	TYPE_STRING
 			  03551		msg	'Hello world.'
+14D1 0C				.fcb	12&$FF	; the length
+14D2 00				.fcb	12>>8
+14D3 48656C6C6F20776F			.fcc	`Hello world.`
+14DB 726C642E
 			  03552

Now to design and implement the function call mechanism, then we can really have some fun…

2 Likes

Also, the compiler and the assembler needs to be modified to deal with a flaw in string handling. Can you spot what that is?

Fire in the hole!

 			  00092	; 00002	print("Hello world.")
 020E AD 13E0	      [4] 00093		lda	_print
 0211 0D 13E1	      [4] 00094		ora	_print+1
 0214 D0 0B (0221)  [2/3] 00095		bne	L00000
 			  00096
 0216 A9 E0	      [2] 00097		lda	#_print&$FF
 0218 85 38	      [3] 00098		sta	Var
 021A A9 13	      [2] 00099		lda	#_print>>8
 021C 85 39	      [3] 00100		sta	Var+1
 021E 20 1378	      [6] 00101		jsr	Undef
 			  00102
 0221			  00103	L00000
 0221 A9 04	      [2] 00104		lda	#4
 0223 85 06	      [3] 00105		sta	Int0
 0225 A9 00	      [2] 00106		lda	#0
 0227 85 07	      [3] 00107		sta	Int0+1
 0229 20 03D8	      [6] 00108		jsr	Alloc
 022C A0 00	      [2] 00109		ldy	#0
 022E A9 83	      [2] 00110		lda	#TYPE_TUPLE
 0230 91 12	      [6] 00111		sta	(Ptr0),Y
 0232 A9 01	      [2] 00112		lda	#1
 0234 C8	      [2] 00113		iny
 0235 91 12	      [6] 00114		sta	(Ptr0),Y
 0237 A0 03	      [2] 00115		ldy	#3
 0239 A2 D1	      [2] 00116		ldx	#S_00000&$FF
 023B A9 13	      [2] 00117		lda	#S_00000>>8
 023D 91 12	      [6] 00118		sta	(Ptr0),Y
 023F 8A	      [2] 00119		txa
 0240 88	      [2] 00120		dey
 0241 91 12	      [6] 00121		sta	(Ptr0),Y
 0243 88	      [2] 00122		dey
 0244 A5 12	      [3] 00123		lda	Ptr0
 0246 85 30	      [3] 00124		sta	Pargs
 0248 A5 13	      [3] 00125		lda	Ptr0+1
 024A 85 31	      [3] 00126		sta	Pargs+1
 024C A9 02	      [2] 00127		lda	#2
 024E 85 06	      [3] 00128		sta	Int0
 0250 A9 00	      [2] 00129		lda	#0
 0252 85 07	      [3] 00130		sta	Int0+1
 0254 20 03D8	      [6] 00131		jsr	Alloc
 0257 A0 00	      [2] 00132		ldy	#0
 0259 A9 85	      [2] 00133		lda	#TYPE_DICT
 025B 91 12	      [6] 00134		sta	(Ptr0),Y
 025D A9 00	      [2] 00135		lda	#0
 025F C8	      [2] 00136		iny
 0260 91 12	      [6] 00137		sta	(Ptr0),Y
 0262 A5 12	      [3] 00138		lda	Ptr0
 0264 85 32	      [3] 00139		sta	Kargs
 0266 A5 13	      [3] 00140		lda	Ptr0+1
 0268 85 33	      [3] 00141		sta	Kargs+1
 026A AD 13E0	      [4] 00142		lda	_print
 026D 85 12	      [3] 00143		sta	Ptr0
 026F AD 13E1	      [4] 00144		lda	_print+1
 0272 85 13	      [3] 00145		sta	Ptr0+1
 0274 20 0941	      [6] 00146		jsr	Call

The passing of keyword arguments is complete.

This Python program

one=1
print(one, 2, '3', sep=',', end="...\n")

compiles to

 			  00091	; 00001	one=1
 020E A2 48	      [2] 00092		ldx	#I_00001&$FF
 0210 A9 14	      [2] 00093		lda	#I_00001>>8
 0212 86 10	      [3] 00094		stx	PtrA
 0214 85 11	      [3] 00095		sta	PtrA+1
 			  00096
 0216 AE 146E	      [4] 00097		ldx	_one
 0219 AD 146F	      [4] 00098		lda	_one+1
 021C 20 04F7	      [6] 00099		jsr	DeRef
 			  00100
 021F A6 10	      [3] 00101		ldx	PtrA
 0221 A5 11	      [3] 00102		lda	PtrA+1
 0223 8E 146E	      [4] 00103		stx	_one
 0226 8D 146F	      [4] 00104		sta	_one+1
 			  00105	; 00002	print(one, 2, '3', sep=',', end="...\n")
 0229 AD 1465	      [4] 00106		lda	_print
 022C 0D 1466	      [4] 00107		ora	_print+1
 022F D0 0B (023C)  [2/3] 00108		bne	L00000
 			  00109
 0231 A9 65	      [2] 00110		lda	#_print&$FF
 0233 85 38	      [3] 00111		sta	Var
 0235 A9 14	      [2] 00112		lda	#_print>>8
 0237 85 39	      [3] 00113		sta	Var+1
 0239 20 13ED	      [6] 00114		jsr	Undef
 			  00115
 023C			  00116	L00000
 023C A9 08	      [2] 00117		lda	#8
 023E 85 06	      [3] 00118		sta	Int0
 0240 A9 00	      [2] 00119		lda	#0
 0242 85 07	      [3] 00120		sta	Int0+1
 0244 20 044D	      [6] 00121		jsr	Alloc
 0247 A5 12	      [3] 00122		lda	Ptr0
 0249 85 30	      [3] 00123		sta	Pargs
 024B A5 13	      [3] 00124		lda	Ptr0+1
 024D 85 31	      [3] 00125		sta	Pargs+1
 024F A0 00	      [2] 00126		ldy	#0
 0251 A9 83	      [2] 00127		lda	#TYPE_TUPLE
 0253 91 30	      [6] 00128		sta	(Pargs),Y
 0255 A9 03	      [2] 00129		lda	#3
 0257 C8	      [2] 00130		iny
 0258 91 30	      [6] 00131		sta	(Pargs),Y
 025A A9 0A	      [2] 00132		lda	#10
 025C 85 06	      [3] 00133		sta	Int0
 025E A9 00	      [2] 00134		lda	#0
 0260 85 07	      [3] 00135		sta	Int0+1
 0262 20 044D	      [6] 00136		jsr	Alloc
 0265 A5 12	      [3] 00137		lda	Ptr0
 0267 85 32	      [3] 00138		sta	Kargs
 0269 A5 13	      [3] 00139		lda	Ptr0+1
 026B 85 33	      [3] 00140		sta	Kargs+1
 026D A0 00	      [2] 00141		ldy	#0
 026F A9 85	      [2] 00142		lda	#TYPE_DICT
 0271 91 32	      [6] 00143		sta	(Kargs),Y
 0273 A9 02	      [2] 00144		lda	#2
 0275 C8	      [2] 00145		iny
 0276 91 32	      [6] 00146		sta	(Kargs),Y
 0278 A9 07	      [2] 00147		lda	#7
 027A 85 02	      [3] 00148		sta	Byt0
 027C A0 09	      [2] 00149		ldy	#9
 027E A2 5E	      [2] 00150		ldx	#S_00000&$FF
 0280 A9 14	      [2] 00151		lda	#S_00000>>8
 0282 91 32	      [6] 00152		sta	(Kargs),Y
 0284 8A	      [2] 00153		txa
 0285 88	      [2] 00154		dey
 0286 91 32	      [6] 00155		sta	(Kargs),Y
 0288 88	      [2] 00156		dey
 0289 A9 14	      [2] 00157		lda	#S_00001>>8
 028B 91 32	      [6] 00158		sta	(Kargs),Y
 028D 88	      [2] 00159		dey
 028E A9 58	      [2] 00160		lda	#S_00001&$FF
 0290 91 32	      [6] 00161		sta	(Kargs),Y
 0292 88	      [2] 00162		dey
 0293 A2 4E	      [2] 00163		ldx	#S_00002&$FF
 0295 A9 14	      [2] 00164		lda	#S_00002>>8
 0297 91 32	      [6] 00165		sta	(Kargs),Y
 0299 8A	      [2] 00166		txa
 029A 88	      [2] 00167		dey
 029B 91 32	      [6] 00168		sta	(Kargs),Y
 029D 88	      [2] 00169		dey
 029E A9 14	      [2] 00170		lda	#S_00003>>8
 02A0 91 32	      [6] 00171		sta	(Kargs),Y
 02A2 88	      [2] 00172		dey
 02A3 A9 52	      [2] 00173		lda	#S_00003&$FF
 02A5 91 32	      [6] 00174		sta	(Kargs),Y
 02A7 88	      [2] 00175		dey
 02A8 84 03	      [3] 00176		sty	Byt1
 02AA A4 02	      [3] 00177		ldy	Byt0
 02AC A2 4A	      [2] 00178		ldx	#S_00004&$FF
 02AE A9 14	      [2] 00179		lda	#S_00004>>8
 02B0 91 30	      [6] 00180		sta	(Pargs),Y
 02B2 8A	      [2] 00181		txa
 02B3 88	      [2] 00182		dey
 02B4 91 30	      [6] 00183		sta	(Pargs),Y
 02B6 88	      [2] 00184		dey
 02B7 A2 46	      [2] 00185		ldx	#I_00002&$FF
 02B9 A9 14	      [2] 00186		lda	#I_00002>>8
 02BB 91 30	      [6] 00187		sta	(Pargs),Y
 02BD 8A	      [2] 00188		txa
 02BE 88	      [2] 00189		dey
 02BF 91 30	      [6] 00190		sta	(Pargs),Y
 02C1 88	      [2] 00191		dey
 02C2 AE 146E	      [4] 00192		ldx	_one
 02C5 8A	      [2] 00193		txa
 02C6 0D 146F	      [4] 00194		ora	_one+1
 02C9 D0 0B (02D6)  [2/3] 00195		bne	L00001
 			  00196
 02CB A9 6E	      [2] 00197		lda	#_one&$FF
 02CD 85 38	      [3] 00198		sta	Var
 02CF A9 14	      [2] 00199		lda	#_one>>8
 02D1 85 39	      [3] 00200		sta	Var+1
 02D3 20 13ED	      [6] 00201		jsr	Undef
 			  00202
 02D6			  00203	L00001
 02D6 AD 146F	      [4] 00204		lda	_one+1
 02D9 91 30	      [6] 00205		sta	(Pargs),Y
 02DB 8A	      [2] 00206		txa
 02DC 88	      [2] 00207		dey
 02DD 91 30	      [6] 00208		sta	(Pargs),Y
 02DF AD 1465	      [4] 00209		lda	_print
 02E2 85 12	      [3] 00210		sta	Ptr0
 02E4 AD 1466	      [4] 00211		lda	_print+1
 02E7 85 13	      [3] 00212		sta	Ptr0+1
 02E9 20 09B6	      [6] 00213		jsr	Call
 			  00214

If you think that looks awfully complicated, you would be right. But the dynamic typing of Python demands it.

3 Likes

After an operation involving variable-length integers, a number may occupy more memory than necessary.

For example,

100000000000000000000000000
-99999999999999999999999999
=1

This bit of code handles the job by repeatedly lopping off the upper byte of a number if it is just an extension of the number’s sign:

 			  01769	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 			  01770	;
 			  01771	; int.condense - Minimize the memory required by an integer.
 			  01772	;
 			  01773	; Input:
 			  01774	;	Ptr3 = address of the high byte of the uncondensed integer
 			  01775	;	Int3 = length of the uncondensed integer
 			  01776	;
 			  01777	; Output:
 			  01778	;	Int3 = length of the condensed integer
 			  01779	;
 			  01780	; Uses:
 			  01781	;	Byt0 = upper sign byte
 			  01782	;	Byt1 = upper bit of next lower byte
 			  01783	;
 0AC4			  01784	int.condense
 0AC4 A0 00	      [2] 01785		ldy	#0			; Get upper byte
 0AC6 B1 1A	    [5/6] 01786		lda	(Ptr3),Y
 0AC8 D0 37 (0B01)  [2/4] 01787		bne	int.condenseTryNegative
 			  01788
 0ACA 85 02	      [3] 01789		sta	Byt0			; Look for 0 in the upper byte
 0ACC 85 03	      [3] 01790		sta	Byt1			; And 0 in the upper bit of next byte
 			  01791
 0ACE			  01792	int.condenseLoop
 0ACE B1 1A	    [5/6] 01793		lda     (Ptr3),Y		; Upper byte purely sign extension?
 0AD0 C5 02	      [3] 01794		cmp	Byt0
 0AD2 D0 3A (0B0E)  [2/4] 01795		bne	int.condenseDone	; Done if no
 			  01796
 0AD4 38	      [2] 01797		sec				; Get next lower byte
 0AD5 A5 1A	      [3] 01798		lda	Ptr3
 0AD7 E9 01	      [2] 01799		sbc	#1
 0AD9 85 1A	      [3] 01800		sta	Ptr3
 0ADB A5 1B	      [3] 01801		lda	Ptr3+1
 0ADD E9 00	      [2] 01802		sbc	#0
 0ADF 85 1B	      [3] 01803		sta	Ptr3+1
 			  01804
 0AE1 B1 1A	    [5/6] 01805		lda	(Ptr3),Y		; Upper bit matching?
 0AE3 29 80	      [2] 01806		and	#$80
 0AE5 C5 03	      [3] 01807		cmp	Byt1
 0AE7 D0 25 (0B0E)  [2/4] 01808		bne	int.condenseDone	; Done if no
 			  01809
 0AE9 A5 0D	      [3] 01810		lda	Int3+1			; Quit if only one byte left
 0AEB D0 06 (0AF3)  [2/3] 01811		bne	int.condenseContinue
 0AED A5 0C	      [3] 01812		lda	Int3
 0AEF C9 01	      [2] 01813		cmp	#1
 0AF1 F0 1B (0B0E)  [2/4] 01814		beq	int.condenseDone
 			  01815
DEI Research 6502 Cross Assembler  Version 0.0   Nov-18-2017 06:47:50  Page 33
rtc.a65

 Addr Code	   Cycles Line#	  Source Statement

 0AF3			  01816	int.condenseContinue
 0AF3 38	      [2] 01817		sec				; Decrement byte count
 0AF4 E9 01	      [2] 01818		sbc	#1
 0AF6 85 0C	      [3] 01819		sta	Int3
 0AF8 A5 0D	      [3] 01820		lda	Int3+1
 0AFA E9 00	      [2] 01821		sbc	#0
 0AFC 85 0D	      [3] 01822		sta	Int3+1
 			  01823
 0AFE 4C 0ACE	      [3] 01824		jmp	int.condenseLoop
 			  01825
 0B01			  01826	int.condenseTryNegative
 0B01 C9 FF	      [2] 01827		cmp	#$FF			; Done if not sign extension
 0B03 D0 09 (0B0E)  [2/3] 01828		bne	int.condenseDone
 			  01829
 0B05 85 02	      [3] 01830		sta	Byt0			; Look for FF in the upper byte
 			  01831
 0B07 A9 80	      [2] 01832		lda	#$80			; And 1 in the upper bit of next byte
 0B09 85 03	      [3] 01833		sta	Byt1
 			  01834
 0B0B 4C 0ACE	      [3] 01835		jmp	int.condenseLoop
 			  01836
 0B0E			  01837	int.condenseDone
1 Like

First try at compiling integer operations:

			  00106	; 00002	one = one + 1
 0229 AE 15D8	      [4] 00107		ldx	_one
 022C 8A	      [2] 00108		txa
 022D 0D 15D9	      [4] 00109		ora	_one+1
 0230 D0 0B (023D)  [2/3] 00110		bne	L00000
 			  00111
 0232 A9 D8	      [2] 00112		lda	#_one&$FF
 0234 85 3A	      [3] 00113		sta	Var
 0236 A9 15	      [2] 00114		lda	#_one>>8
 0238 85 3B	      [3] 00115		sta	Var+1
 023A 20 157D	      [6] 00116		jsr	Undef
 			  00117
 023D			  00118	L00000
 023D AD 15D9	      [4] 00119		lda	_one+1
 0240 86 14	      [3] 00120		stx	Ptr0
 0242 85 15	      [3] 00121		sta	Ptr0+1
 0244 A2 D6	      [2] 00122		ldx	#I_00001&$FF
 0246 A9 15	      [2] 00123		lda	#I_00001>>8
 0248 86 16	      [3] 00124		stx	Ptr1
 024A 85 17	      [3] 00125		sta	Ptr1+1
 024C 20 0991	      [6] 00126		jsr	int.__add__
 024F A6 14	      [3] 00127		ldx	Ptr0
 0251 A5 15	      [3] 00128		lda	Ptr0+1
 0253 86 12	      [3] 00129		stx	PtrA
 0255 85 13	      [3] 00130		sta	PtrA+1
 0257 20 055D	      [6] 00131		jsr	AddRef
 			  00132
 025A AE 15D8	      [4] 00133		ldx	_one
 025D AD 15D9	      [4] 00134		lda	_one+1
 0260 20 047A	      [6] 00135		jsr	DeRef
 			  00136
 0263 A6 12	      [3] 00137		ldx	PtrA
 0265 A5 13	      [3] 00138		lda	PtrA+1
 0267 8E 15D8	      [4] 00139		stx	_one
 026A 8D 15D9	      [4] 00140		sta	_one+1

Binary format for 6502/C64 - http://fileformats.archiveteam.org/wiki/Commodore_64_binary_executable

From what I can tell there’s a .PRG file that boot straps the first two bytes of .BIN file into $8001.