[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

Draco · May 1, 2018, 12:58am

I’m wanting to use your python compiler already… Have you given any thought to how to access direct memory? There are no PEEKs and POKEs or pointers for python except perhaps here …

It also would be useful to do something like SYS or call directly to an ML sub

Bill · May 1, 2018, 6:02am

I avoid self-modifying code whenever possible. I was always been taught that self-modifying code was “evil.” Self-modifying code

is more difficult to understand.
is more difficult to debug.
cannot be used in code residing in ROM or flash memory.
is not shareable.
can actually make a program run slower by forcing a cache reload.
is prohibited by some operating systems in the name of fighting malware.

That said, I have seen cases in which self-modifying code is not really avoidable

Self-modifying code can often make code smaller when memory space is limited.
Self-modifying code can be substantially faster by eliminating testing and branching. The preferred way to do this is by generating code in a buffer and calling it. I have personally done a lot of this in graphics driver work.
For good and bad, self-modifying code can hide its function from people using debuggers and other analysis tools.
Technically, some systems, like JIT compilers and FORTH interpreters, are inherently self-modifying.
In the early days of IBM-compatibles, a popular utility program identified the processor in the system. One of the ways to distinguish between an 8088 and an 8086 is to modify the instruction stream just ahead of the current instruction and seeing which one the instruction prefetch picked up.

Bill · May 1, 2018, 6:14am

Python does not have pointers and Python variables behave so differently from memory-mapped variables in other languages that PEEK and POKE functions are the only sane ways to access ports at fixed memory addresses.

Draco · May 1, 2018, 6:54am

It seems many of the negative points do not apply to the 6502 in your application.

It is but in this case complied code is not meant to be understood, just run.

This is true. I will give you that.

True, unless parts are copied to RAM or Shadow RAM for speed. RAM is cheap.

Do you mean relocatable? You certainly can share it with others.

6502 system can’t really benefit from cache, because memory accesses are already single-cycle

With a simple system such as the 6502, I’m not sure what operating system or anti-virus you would want

I think the 6502 64k limitation is always an issue with IO and ROM included in that

Draco · May 1, 2018, 7:20am

I think you will find this highly interesting …

Bill · May 1, 2018, 8:46am

Whoa. Major miscommunication. I was talking about the disadvantages of self-modifying code in general, not strictly in terms of this project.

Neither the compiler-generated code nor the run-time library currently use any self-modifying code techniques. And they will not unless there is a real demonstrated need.

The compiler-generated code is not that big. The run-time library is because of two reasons. There is no linker yet, so all of the run-time library code is included in the final image whether it is used or not. Like C, Python suffers from the “printf problem.” If “print” is used, the formatting code for all variable types must be included.

Understandability of the code is important to the guy writing the compiler and the run-time library. That would be me.

No, I am talking about multiple instances of a program using the same copy of the code.

mikiex · May 1, 2018, 12:19pm

Good example of the stack blasting method on the 6809 here https://nowhereman999.wordpress.com/2017/09/16/optimizing-6809-assembly-code-part-3-stack-blasting-and-self-modifying-code/

Draco · May 1, 2018, 4:03pm

Ah ok. I was more talking about self modifying code for the 6502 and small 8 bit cpus but also relating it to the thread. Sorry for the miscommunication. You are correct on larger systems and really there is no need for it.

Bill · May 3, 2018, 9:04am

Wow, that is over an hour long. I’ll watch it later when I have the time. How about this shorter one?

denzuko · May 3, 2018, 5:20pm

That’s an interesting video. Personally I enjoyed the CCC talk 28c3 on the behind the scene of a c64 demo

28c3: Behind the scenes of a C64 demo

Bill · May 8, 2018, 3:07pm

@dave and I have been talking. He stated that some 6502 systems do not have any mass storage or the storage media is incompatible with common standards in the industry. Meaning that the only way to get code into the system is by slow serial transfer. He said it would be nice if a program can be run several times after a single download.

He reported that a program quit with an out of memory error when it was rerun without reloading it first.

In Python, a variable is nothing more than a reference to an object, a pointer, if you will. An uninitialized variable cannot be used until something is assigned to it first. That something is often dynamically allocated in cases of variable-precision integers or compound objects resulting from a calculation. Before something can be assigned to a variable, the previous reference must be deleted and the associated data freed if it was allocated. In this particular case, variables are left referring to allocated data the first time the program is run. Freeing an object allocated in the prior run of the program corrupted the heap resulting in the out of memory condition.

But wait, there’s more. “print” is a variable whose initialized value is a reference to a subroutine which formats and displays the value of data. It is perfectly valid Python code to assign another function to “print” to alter the way it works. When a program is restarted, “print” should revert to its original behavior. This means another list is required to enumerate variables and restore their original values.

A program can be made rerunable by adding a list of variables to be zeroed and another list of variables to be set to arbitrary values when a program starts. Because this behavior is not always needed, it is only generated when the “reinit” compiler option is specified.

Bill · May 9, 2018, 8:19am

I just found a particularly nasty bug in the multiply code.

@dave reported to me that the following code does not work:

n  =  30000 + 40000
print(n)
n  =  30000 - 40000
print(n)
n  =  90000 + 40000
print(n)
n  =  1000000 + 40000
print(n)
n  =  500000 + 40000
print(n)

I explained that though the run-time library implements variable-precision integers, the internal arithmetic code in the compiler does not at this time. I suggested that he try:

c = 1000
a = 30 * c
b = 40 * c
n = a + b
print(n)
a = 30 * c
b = 40 * c
n = a - b
print(n)
a = 90 * c
b = 40 * c
n = a + b
print(n)
a = 1000 * c
b = 40 * c
n = a + b
print(n)
a = 500 * c
b = 40 * c
n = a + b
print(n)

or:

c = 1000
n = 30 * c + 40 * c
print(n)
n = 30 * c - 40 * c
print(n)
n = 90 * c + 40 * c
print(n)
n = 1000 * c + 40 * c
print(n)
n = 500 * c + 40 * c
print(n)

Then I took my own advice. The first block of code gave the expected answers; the second was dramatically wrong. Hmm…

The problem turned out to be the end of the multiply routine:

 14DD			  03502	int.__mul__6
 14DD 18	      [2] 03503		clc				; Point Ptr3 to the upper byte
 14DE A5 38	      [3] 03504		lda	Scr3
 14E0 65 0A	      [3] 03505		adc	Int2
 14E2 85 1E	      [3] 03506		sta	Ptr3
 14E4 A5 39	      [3] 03507		lda	Scr3+1
 14E6 65 0B	      [3] 03508		adc	Int2+1
 14E8 85 1F	      [3] 03509		sta	Ptr3+1
 14EA 38	      [2] 03510		sec
 14EB A5 1E	      [3] 03511		lda	Ptr3
 14ED E9 01	      [2] 03512		sbc	#1
 14EF 85 1E	      [3] 03513		sta	Ptr3
 14F1 A5 1F	      [3] 03514		lda	Ptr3+1
 14F3 E9 00	      [2] 03515		sbc	#0
 14F5 85 1E	      [3] 03516		sta	Ptr3
 			  03517
 14F7 20 1031	      [6] 03518		jsr	int.condense
 			  03519
 14FA 60	      [6] 03520		rts

The final

 14F5 85 1E	      [3] 03516		sta	Ptr3

should be

 14F5 85 1F	      [3] 03516		sta	Ptr3+1

Bill · May 9, 2018, 5:51pm

If you are following along at home, floordiv has the same defect.

 16F7			  03885	int.__floordiv__6
 16F7 A6 34	      [3] 03886		ldx	Scr2			; Temporarily switch scratch buffers
 16F9 A5 38	      [3] 03887		lda	Scr3
 16FB 86 38	      [3] 03888		stx	Scr3
 16FD 85 34	      [3] 03889		sta	Scr2
 16FF A6 35	      [3] 03890		ldx	Scr2+1
 1701 A5 39	      [3] 03891		lda	Scr3+1
 1703 86 39	      [3] 03892		stx	Scr3+1
 1705 85 35	      [3] 03893		sta	Scr2+1
 			  03894
 1707 18	      [2] 03895		clc				; Point Ptr3 to the upper byte
 1708 A5 38	      [3] 03896		lda	Scr3
 170A 65 0A	      [3] 03897		adc	Int2
 170C 85 1E	      [3] 03898		sta	Ptr3
 170E A5 39	      [3] 03899		lda	Scr3+1
 1710 65 0B	      [3] 03900		adc	Int2+1
 1712 85 1F	      [3] 03901		sta	Ptr3+1
 1714 38	      [2] 03902		sec
 1715 A5 1E	      [3] 03903		lda	Ptr3
 1717 E9 01	      [2] 03904		sbc	#1
 1719 85 1E	      [3] 03905		sta	Ptr3
 171B A5 1F	      [3] 03906		lda	Ptr3+1
 171D E9 00	      [2] 03907		sbc	#0
 171F 85 1E	      [3] 03908		sta	Ptr3

That last store should also be to Ptr3+1

Bill · May 9, 2018, 8:46pm

That was not the only thing wrong with floordiv. The section of code should look like this:

 14A2			  03594	int.__floordiv__6
 14A2 A6 34	      [3] 03595		ldx	Scr2			; Temporarily switch scratch buffers
 14A4 A5 38	      [3] 03596		lda	Scr3
 14A6 86 38	      [3] 03597		stx	Scr3
 14A8 85 34	      [3] 03598		sta	Scr2
 14AA A6 35	      [3] 03599		ldx	Scr2+1
 14AC A5 39	      [3] 03600		lda	Scr3+1
 14AE 86 39	      [3] 03601		stx	Scr3+1
 14B0 85 35	      [3] 03602		sta	Scr2+1
 			  03603
 14B2 A5 0C	      [3] 03604		lda	Int3
 14B4 85 0A	      [3] 03605		sta	Int2
 14B6 A5 0D	      [3] 03606		lda	Int3+1
 14B8 85 0B	      [3] 03607		sta	Int2+1
 			  03608
 14BA 18	      [2] 03609		clc				; Point Ptr3 to the upper byte
 14BB A5 38	      [3] 03610		lda	Scr3
 14BD 65 0A	      [3] 03611		adc	Int2
 14BF 85 1E	      [3] 03612		sta	Ptr3
 14C1 A5 39	      [3] 03613		lda	Scr3+1
 14C3 65 0B	      [3] 03614		adc	Int2+1
 14C5 85 1F	      [3] 03615		sta	Ptr3+1
 14C7 38	      [2] 03616		sec
 14C8 A5 1E	      [3] 03617		lda	Ptr3
 14CA E9 01	      [2] 03618		sbc	#1
 14CC 85 1E	      [3] 03619		sta	Ptr3
 14CE A5 1F	      [3] 03620		lda	Ptr3+1
 14D0 E9 00	      [2] 03621		sbc	#0
 14D2 85 1F	      [3] 03622		sta	Ptr3+1
 			  03623
 14D4 20 0DDC	      [6] 03624		jsr	int.condense
 14D7 20 0E27	      [6] 03625		jsr	int.finish

Bill · May 11, 2018, 6:29am

The testing of multiplication and division continues.

I can now run this code:

a = 1000
print('a =', a)
a = a * a
print('a * a =', a)
a = a * a
print('a * a =', a)
b = a // 250
print(a, '// 250 =', b)

yielding this result:

a = 1000
a * a = 1000000
a * a = 1000000000000
1000000000000 // 250 = 4000000000

I am getting really tired of doing address arithmetic in 8-bit chunks…

Bill · May 13, 2018, 4:55pm

At long last, string repetition is working.

This code:

a = 'abc'
b = a * False
print('"' + b + '"')
b = a * True
print('"' + b + '"')
b = a * 0
print('"' + b + '"')
b = a * 1
print('"' + b + '"')
b = a * 16
print(b)
b = 32 * a
print(b)

yields this output:

""
"abc"
""
"abc"
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc

And I have not found any more surprises in the multiply and divide code. One of the tests:

a = 4
b = a // 2
print('4 // 2 =', b)
a = 1000
print('a =', a)
b = 4 * a
print('4 * a =', b)
b = a * 4
print('a * 4 =', b)
b = 4 // a
print('4 // a =', b)
b = a // 4
print('a // 4 =', b)
a = a * a
print('a * a =', a)
b = 4 * a
print('4 * a =', b)
b = a * 4
print('a * 4 =', b)
a = a * a
print('a * a =', a)
b = 4 * a
print('4 * a =', b)
b = a * 4
print('a * 4 =', b)
b = a // 250
print(a, '// 250 =', b)
b = a // 128
print(a, '// 128 =', b)
d = 1000
d = a // 4
print(a, '// 4', '=', d)
a=1000
c=250
d = a // c
print(a, '//', c, '=', d)
a=1000
c=250*a
a=a*a
d = a // c
print(a, '//', c, '=', d)
d = c // 4
print(c, '// 4 =', d)
a=1000
c=250*a*a
a=a*a*a
d = a // c
print(a, '//', c, '=', d)
d = c // 4
print(c, '// 4 =', d)
a=1000
c=250*a*a*a
a=a*a*a*a
d = a // c
print(a, '//', c, '=', d)
d = c // 4
print(c, '// 4 =', d)

and its output:

4 // 2 = 2
a = 1000
4 * a = 4000
a * 4 = 4000
4 // a = 0
a // 4 = 250
a * a = 1000000
4 * a = 4000000
a * 4 = 4000000
a * a = 1000000000000
4 * a = 4000000000000
a * 4 = 4000000000000
1000000000000 // 250 = 4000000000
1000000000000 // 128 = 7812500000
1000000000000 // 4 = 250000000000
1000 // 250 = 4
1000000 // 250000 = 4
250000 // 4 = 62500
1000000000 // 250000000 = 4
250000000 // 4 = 62500000
1000000000000 // 250000000000 = 4
250000000000 // 4 = 62500000000

Bill · May 29, 2018, 6:28am

Lately, I have been working on implementing variable precision integers in the compiler. It has been slow going. Not only the operations on the integers, but integrating them into the expression parser and the symbol table.

In parallel, I have been reacquainting myself with the floating point code written quite awhile ago. It was originally developed using Mk I of my 6800 emulator. However, it has a problem with my Mk II emulator. Instead of

Enter an operation: 1 - 2
-1

it does

Enter an operation: 1 - 2
-2

It works fine in my Mk II 6809 emulator, so the problem is almost certainly a faulty implementation of one machine instruction in the 6800 emulator.

Along the way, I seem to recall porting this code to the AVR. Sure enough, I have a version last touched in 2015. It can subtract two from one just fine. And this version has much better output formatting than the original 680x code:

Float? 2 + 3
40 A0 00 00   +1.01000000000000000000000 * 2 ^ 2    5.000000E0
Float? 1 - 2
BF 80 00 00   -1.00000000000000000000000 * 2 ^ 0    -1.000000E0
Float? 10000 * 10000
4C BE BC 20   +1.01111101011110000100000 * 2 ^ 26    1.000000E8
Float? 1 / 3
3E AA AA AA   +1.01010101010101010101010 * 2 ^ -2    3.333333E-1

But it has its own problems:

Float? 10 + 1
41 30 00 00   +1.01100000000000000000000 * 2 ^ 3    1.099999E1

It is coming back now. I last worked on it in mid July of 2015 and went down the rabbit hole of researching how to quickly and accurately print floating point numbers. A complicated subject I still have not mastered. This was also the time when I investigated joining DMS. The following month, I joined, started messing with Arduino and stopped working on my own AVR tools.

Bill · May 30, 2018, 8:38am

Because of the mostly functional floating point code, the AVR and 680x will be the next platforms targeted by the compiler.

Work on these will take place in the latter half of the year.

A port of the floating point code to the 6502 will have to come after that.

Draco · May 30, 2018, 9:18am

Python for the Arduino will be very popular if it works well … and has an easy editor / way to upload scripts

Bill · May 30, 2018, 9:23am

That will be the hard part. How to integrate it into the IDE.