[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

kthompson395 · March 23, 2021, 6:28pm

If you love python, wait until you discover julia. It’s sort of like python with C-like speed where all of your data types are actually close to the metal rather than structs.

Bill · April 14, 2021, 4:40am

Thanks for the tip.

I have heard of it but have never looked into it.

My earlier thought was that type hinting may allow a compiler to generate better code for Python.

Bill · May 7, 2021, 12:14pm

Have you made any progress on this?

I got as far as the label CONT2 when other things diverted my attention.

urbite · May 14, 2021, 9:17pm

I have to admit that I got distracted on so many other things that this got lost in the shuffle. Will put it back on my near-term ‘pending’ pile

Bill · May 16, 2021, 12:39pm

I recently resumed working on my “collection” of Space Voyage games.

I think I may finally have all five current versions (6800, 6502, 8080, AVR and 68K) behaving in the same way. The pseudo-random generator has been temporarily wired to start with the same seed so that they may be compared.

In addition to 9900 coding, I will need information on the best cross-development tools and emulators as well as three interface subroutines: write a character, read a character and check whether a character has been typed. I am currently using my own cross-assembler and emulator.

urbite · May 18, 2021, 7:51pm

Following cross-dev tool links are from post 170 in this thread.

Excellent TI 99/4A technical info website by Dr. Thierry Nouspikel, who started his TI 99/4A work as a student. Very rich source of info.
http://www.unige.ch/medecine/nouspikel/ti99/titechpages.htm

Complete TMS9900/TI 99/4A cross development too suite (written in python)
https://endlos99.github.io/xdt99/

I would probably go with Classic99 for the emulator
http://www.harmlesslion.com/software/Classic99

Here’s a list of other emulators
http://99er.net/emul.shtml

Will have to get back with you on the functions to read/check for character. That’s the easy part if you use the existing console ROM API calls.

Writing a character will require some infrastructure, since the screen is tile-mapped. So scrolling the screen and other operations to make it look like a terminal are not natively part of the console API. There are frameworks out there to implement that, and may actually be available by GPL calls. But I never did much with GPL, so would have to do some research there.

A good starting reference on the available API calls is the TI Editor/Assembler manual. It also has detailed description of 9900 assembly language and a breakdown of each instruction.
http://99er.net/download2/index.php?act=view&id=88

Bill · May 21, 2021, 11:29am

I have been using this for instruction set reference instead of the Editor/Assembler manual:

urbite · May 21, 2021, 1:03pm

The authoritative source is the TMS9900 Data Manual, which is attached. The instruction set description is much condensed over the 99/4A manuals.

The only thing that won’t be correct is the clock cycle count. 4 clock cycles need to be added to all memory accesses, except for accesses to the ROM (0x0000-0x1FFF) and the 256 byte scratchpad RAM (0x8300-0x83FF). This is due to wait states added during the conversion of a single 16-bit memory cycle to two 8-bit memory accesses. Why the conversion is done is part of the 99/4A history.

TMS9900_DataManual_ocr.pdf (1.2 MB)

Bill · May 22, 2021, 12:51pm

I have been using Appendix A of this manual for instruction timing information:

Bill · June 1, 2021, 2:44am

Many moons ago, I bought all three flavors of Turbo Pascal version 3, CP/M-80, CP/M-86 and PC-DOS. Not only did the CP/M-80 version need a Z80 to run, but the programs it generated also required a Z80, leaving out the many systems based on the 8080 or 8085.

I took brief a look at trying to modify it to generate 8080 code, but that effort did not get very far as more of my time was being spent learning to program for the IBM PC.

Recent discussions about simulating an 8080 brought up the incompatibility between Turbo Pascal and the 8080.

I believe that modifying the compiler to directly generate 8080 code to be too difficult. Modifying it to run on an 8080 is out of the question. But much of the same result may be achieved by translating the compiled code to the 8080, then stitching it to a run-time library rewritten for the 8080.

I am happy to be able to report that the prototype for converting the Z80 code produced by the CP/M Turbo Pascal compiler to 8080 code is going well. So far, no Z80 instructions have been encountered. Unfortunately, the same cannot be said for the subroutine library; it makes extensive use of the index and alternate registers as well as instructions like subtract with carry of register pairs.

The original Pascal code:

  procedure Greet;

    begin
      writeln('Hello world.')
    end;  { Greet }

  begin
    Greet
  end.

The Z80 code generated by the compiler:

0100 C3E320      jp      20E3
Command? u20e3l23
20E3 310001      ld      SP,0100
20E6 21C2EC      ld      HL,ECC2
20E9 0100FF      ld      BC,FF00
20EC CD6503      call    0365
20EF 212021      ld      HL,2120
20F2 11D0EA      ld      DE,EAD0
20F5 0142ED      ld      BC,ED42
20F8 3E01        ld      A,01
20FA CDD504      call    04D5
20FD C31A21      jp      211A
2100 CD9C14      call    149C
2103 CDBB17      call    17BB
Command? d2106ld
2100                    0C 48-65 6C 6C 6F 20 77 6F 72  '      .Hello wor'
2110  6C 64 2E                                         'ld.             '
Command? u2113
2113 CDCE17      call    17CE
2116 CD1C20      call    201C
2119 C9          ret
211A CD0021      call    2100
211D C3D520      jp      20D5

The 8080 code generated by the recompiler:

        org     100h

        jmp     L_20E3

        org     20E3h

L_20E3:
        lxi     SP,0100h
        lxi     H,0ECC2h
        lxi     B,0FF00h
        call    R_0365
        lxi     H,2120h
        lxi     D,0EAD0h
        lxi     B,0ED42h
        mvi     A,01h
        call    R_04D5
        jmp     L_211A
L_2100:
        call    R_149C
        call    R_17BB
        db      12,'Hello world.'
        call    R_17CE
        call    R_201C
        ret
L_211A:
        call    L_2100
        jmp     R_20D5

        end     100h

Bill · June 2, 2021, 2:49am

It turns out that the line

20EF 211921      ld      HL,2119

actually points HL just past the last instruction in the program, I am guessing to initialize the heap, so the generated code is now this:

        org     100h

        jmp     L_20E3

        org     20E3h

L_20E3:
        lxi     SP,0100h
        lxi     H,0ECC2h
        lxi     B,0FF00h
        call    R_0365
        lxi     H,L_End
        lxi     D,0EAD8h
        lxi     B,0ED42h
        mvi     A,01h
        call    R_04D5
        call    R_149C
        call    R_17BB
        db      12,'Hello world.'
        call    R_17CE
        call    R_201C
        jmp     R_20D5
L_End:

        end     100h

Bill · June 7, 2021, 2:25am

Bill:

Float? 2 + 3
40 A0 00 00   +1.01000000000000000000000 * 2 ^ 2    5.000000E0
Float? 1 - 2
BF 80 00 00   -1.00000000000000000000000 * 2 ^ 0    -1.000000E0
Float? 10000 * 10000
4C BE BC 20   +1.01111101011110000100000 * 2 ^ 26    1.000000E8
Float? 1 / 3
3E AA AA AA   +1.01010101010101010101010 * 2 ^ -2    3.333333E-1

But it has its own problems:

Float? 10 + 1
41 30 00 00   +1.01100000000000000000000 * 2 ^ 3    1.099999E1

Last month, I revisited the 6800 version of the floating point code and rewrote the test driver to catch up with the AVR version.

float

The number display is much better plus the value is retained so that another operation may be performed upon it.

There is still some more work to be done. I still need to investigate implementing rounding, guard and sticky bits, then make a double precision version.

Bill · August 5, 2021, 9:34am

Can 8080 code be smaller and faster than Z80 code?

Much has been said about how the new instructions added to the Z80 are often too slow to be useful, that these are more commonly used to create compact code rather than faster execution.

I have a subroutine for which a version written for the 8080 is both smaller and quicker. It is used by CP/M Turbo Pascal programs to release allocated memory back to the heap. The 8080 version is not only faster; it is substantially faster.

I did not originally set out to write a faster implementation but had to reach repeatedly into my bag of tricks to avoid running out of registers while trying to do a literal translation. The result ended up surprisingly more efficient.

The original Z80 code:

                          07241 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                          07242 ;
                          07243 ; Dispose
                          07244 ;
                          07245 ; Input:
                          07246 ;       HL = size of block to free
                          07247 ;       Top of stack = the address of variable containing block address
                          07248 ;
 1D7B                     07249 L_1D7B:
 1D7B EB              [4] 07250         ex      DE,HL           ; Move size to DE
                          07251
 1D7C E1             [10] 07252         pop     HL              ; Get return address
 1D7D E3             [19] 07253         ex      (SP),HL         ; Swap for address of variable
                          07254
 1D7E 7E              [7] 07255         ld      A,(HL)          ; Get address of the block to free
 1D7F 23              [5] 07256         inc     HL
 1D80 66              [7] 07257         ld      H,(HL)
 1D81 6F              [4] 07258         ld      L,A
                          07259
 1D82 13              [5] 07260         inc     DE              ; Round size up to next multiple of 4
 1D83 13              [5] 07261         inc     DE
 1D84 13              [5] 07262         inc     DE
 1D85 7B              [4] 07263         ld      A,E
 1D86 E6FC            [7] 07264         and     0FCh
 1D88 5F              [4] 07265         ld      E,A
                          07266
 1D89 EB              [4] 07267         ex      DE,HL           ; Save "quantized" size
 1D8A 22 00F0        [16] 07268         ld      (L_00F0),HL     ; DE -> block to free
                          07269
 1D8D 2A 00DE        [16] 07270         ld      HL,(L_00DE)     ; Point to the first free block
 1D90 E5             [11] 07271         push    HL
 1D91 DDE1           [14] 07272         pop     IX
                          07273
 1D93 B7              [4] 07274         or      A               ; Is block to free lower?
 1D94 ED52           [15] 07275         sbc     HL,DE
 1D96 3052 (1DEA)  [7/12] 07276         jr      NC,L_1DEA       ; Yes
                          07277
 1D98                     07278 L_1D98:
 1D98 DD6E00         [19] 07279         ld      L,(IX+0)        ; Get next address
 1D9B DD6601         [19] 07280         ld      H,(IX+1)
 1D9E E5             [11] 07281         push    HL
                          07282
 1D9F B7              [4] 07283         or      A               ; Is this block still lower?
 1DA0 ED52           [15] 07284         sbc     HL,DE
 1DA2 3004 (1DA8)  [7/12] 07285         jr      NC,L_1DA8       ; No
                          07286
 1DA4 DDE1           [14] 07287         pop     IX              ; Point to next free block
                          07288
 1DA6 18F0 (1D98)    [12] 07289         jr      L_1D98          ; Keep looking
                          07290
 1DA8                     07291 L_1DA8:
 1DA8 E1             [10] 07292         pop     HL              ; Get block just above block to free
                          07293
 1DA9 D5             [11] 07294         push    DE              ; Point to block to be freed
 1DAA FDE1           [14] 07295         pop     IY
                          07296
 1DAC ED4B 00F0      [20] 07297         ld      BC,(L_00F0)     ; Retrieve block size
                          07298
 1DB0 FD7102         [19] 07299         ld      (IY+2),C        ; Store size
 1DB3 FD7003         [19] 07300         ld      (IY+3),B
 1DB6 FD7500         [19] 07301         ld      (IY+0),L        ; Store next
 1DB9 FD7401         [19] 07302         ld      (IY+1),H
                          07303
 1DBC DD7300         [19] 07304         ld      (IX+0),E        ; Point here from previous block
 1DBF DD7201         [19] 07305         ld      (IX+1),D
                          07306
 1DC2 DDE5           [15] 07307         push    IX              ; Point to previous block
 1DC4 E1             [10] 07308         pop     HL
 1DC5 DD4E02         [19] 07309         ld      C,(IX+2)        ; Get its size
 1DC8 DD4603         [19] 07310         ld      B,(IX+3)
 1DCB CD 1E05        [17] 07311         call    L_1E05          ; Try to coalesce
 1DCE 2809 (1DD9)  [7/12] 07312         jr      Z,L_1DD9        ; Combined?
                          07313
 1DD0 DD5E00         [19] 07314         ld      E,(IX+0)        ; Point to next block
 1DD3 DD5601         [19] 07315         ld      D,(IX+1)
 1DD6 D5             [11] 07316         push    DE
 1DD7 DDE1           [14] 07317         pop     IX
                          07318
 1DD9                     07319 L_1DD9:
 1DD9 DDE5           [15] 07320         push    IX              ; Set up to try combining with next block
 1DDB E1             [10] 07321         pop     HL
 1DDC DD4E02         [19] 07322         ld      C,(IX+2)        ; Load size
 1DDF DD4603         [19] 07323         ld      B,(IX+3)
 1DE2 DD5E00         [19] 07324         ld      E,(IX+0)        ; Load next
 1DE5 DD5601         [19] 07325         ld      D,(IX+1)
 1DE8 181B (1E05)    [12] 07326         jr      L_1E05
                          07327
 1DEA                     07328 L_1DEA:
 1DEA 2A 00DE        [16] 07329         ld      HL,(L_00DE)     ; Freed block is now first free block
 1DED ED53 00DE      [20] 07330         ld      (L_00DE),DE
 1DF1 D5             [11] 07331         push    DE
 1DF2 DDE1           [14] 07332         pop     IX
 1DF4 DD7500         [19] 07333         ld      (IX+0),L        ; Store next
 1DF7 DD7401         [19] 07334         ld      (IX+1),H
 1DFA ED4B 00F0      [20] 07335         ld      BC,(L_00F0)     ; Store size
 1DFE DD7102         [19] 07336         ld      (IX+2),C
 1E01 DD7003         [19] 07337         ld      (IX+3),B
 1E04 EB              [4] 07338         ex      DE,HL
                          07339
 1E05                     07340 L_1E05:
 1E05 09             [11] 07341         add     HL,BC           ; Are the blocks adjacent?
 1E06 B7              [4] 07342         or      A
 1E07 ED52           [15] 07343         sbc     HL,DE
 1E09 C0           [5/11] 07344         ret     NZ              ; No
                          07345
 1E0A D5             [11] 07346         push    DE
 1E0B FDE1           [14] 07347         pop     IY
                          07348
 1E0D 2A 00C4        [16] 07349         ld      HL,(L_00C4)     ; Is upper block at the top of the heap?
 1E10 B7              [4] 07350         or      A
 1E11 ED52           [15] 07351         sbc     HL,DE
 1E13 281B (1E30)  [7/12] 07352         jr      Z,L_1E30        ; Yes
                          07353
 1E15 FD7E00         [19] 07354         ld      A,(IY+0)        ; Transfer next to lower block
 1E18 DD7700         [19] 07355         ld      (IX+0),A
 1E1B FD7E01         [19] 07356         ld      A,(IY+1)
 1E1E DD7701         [19] 07357         ld      (IX+1),A
 1E21 FD6E02         [19] 07358         ld      L,(IY+2)        ; Add the sizes
 1E24 FD6603         [19] 07359         ld      H,(IY+3)
 1E27 09             [11] 07360         add     HL,BC
 1E28 DD7502         [19] 07361         ld      (IX+2),L
 1E2B DD7403         [19] 07362         ld      (IX+3),H
 1E2E AF              [4] 07363         xor     A               ; Clear Z flag
                          07364
 1E2F C9             [10] 07365         ret
                          07366
 1E30                     07367 L_1E30:
 1E30 DDE5           [15] 07368         push    IX              ; Set new top of heap
 1E32 E1             [10] 07369         pop     HL
 1E33 22 00C4        [16] 07370         ld      (L_00C4),HL
                          07371
 1E36 0604            [7] 07372         ld      B,4             ; Zero the header
                          07373
 1E38                     07374 L_1E38:
 1E38 3600           [10] 07375         ld      (HL),0
 1E3A 23              [5] 07376         inc     HL
 1E3B 10FB (1E38)  [8/13] 07377         djnz    L_1E38
                          07378
 1E3D C9             [10] 07379         ret

As rewritten for the 8080:

                          05810 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                          05811 ;
                          05812 ; Dispose
                          05813 ;
                          05814 ; Input:
                          05815 ;       HL = size of block to free
                          05816 ;       Top of stack = the address of variable containing block address
                          05817 ;
 14E9                     05818 Rq1D7B:
 14E9 EB              [4] 05819         xchg                    ; Move size to DE
                          05820
 14EA E1             [10] 05821         pop     H               ; Get return address
 14EB E3             [18] 05822         xthl                    ; Swap for address of variable
                          05823
 14EC 7E              [7] 05824         mov     A,M             ; Get address of the block to free
 14ED 23              [5] 05825         inx     H
 14EE 66              [7] 05826         mov     H,M
 14EF 6F              [5] 05827         mov     L,A
                          05828
 14F0 13              [5] 05829         inx     D               ; Round size up to next multiple of 4
 14F1 13              [5] 05830         inx     D
 14F2 13              [5] 05831         inx     D
 14F3 7B              [5] 05832         mov     A,E
 14F4 E6 FC           [7] 05833         ani     0FCh
 14F6 5F              [5] 05834         mov     E,A
                          05835
 14F7 EB              [4] 05836         xchg                    ; Save "quantized" size
 14F8 22 00F0        [16] 05837         shld    Vq00F0          ; DE -> block to free
                          05838
 14FB 2A 00DE        [16] 05839         lhld    Vq00DE          ; Point to the first free block
                          05840
 14FE 7D              [5] 05841         mov     A,L             ; Is block to free lower?
 14FF 93              [4] 05842         sub     E
 1500 7C              [5] 05843         mov     A,H
 1501 9A              [4] 05844         sbb     D
 1502 DA 151E        [10] 05845         jc      Lq1D98          ; No
                          05846
 1505 2A 00F0        [16] 05847         lhld    Vq00F0          ; Load size into BC
 1508 44              [5] 05848         mov     B,H
 1509 4D              [5] 05849         mov     C,L
                          05850
 150A 2A 00DE        [16] 05851         lhld    Vq00DE          ; Freed block is now first free block
 150D EB              [4] 05852         xchg
 150E 22 00DE        [16] 05853         shld    Vq00DE
                          05854
 1511 73              [7] 05855         mov     M,E             ; Store next
 1512 23              [5] 05856         inx     H
 1513 72              [7] 05857         mov     M,D
                          05858
 1514 23              [5] 05859         inx     H               ; Store size
 1515 71              [7] 05860         mov     M,C
 1516 23              [5] 05861         inx     H
 1517 70              [7] 05862         mov     M,B
 1518 2B              [5] 05863         dcx     H
 1519 2B              [5] 05864         dcx     H
 151A 2B              [5] 05865         dcx     H
                          05866
 151B C3 1554        [10] 05867         jmp     Lq1E05          ; Try to coalesce with next block
                          05868
 151E                     05869 Lq1D98:
 151E 44              [5] 05870         mov     B,H             ; Remember current block address
 151F 4D              [5] 05871         mov     C,L
                          05872
 1520 7E              [7] 05873         mov     A,M             ; Get next address
 1521 23              [5] 05874         inx     H
 1522 66              [7] 05875         mov     H,M
 1523 6F              [5] 05876         mov     L,A
                          05877
 1524 93              [4] 05878         sub     E               ; Is this block still lower?
 1525 7C              [5] 05879         mov     A,H
 1526 9A              [4] 05880         sbb     D
 1527 DA 151E        [10] 05881         jc      Lq1D98          ; Yes
                          05882
 152A E5             [11] 05883         push    H               ; Save address of block above block to free
                          05884
 152B 60              [5] 05885         mov     H,B             ; Point to block just below block to free
 152C 69              [5] 05886         mov     L,C
                          05887
 152D 73              [7] 05888         mov     M,E             ; Point here from previous block
 152E 23              [5] 05889         inx     H
 152F 72              [7] 05890         mov     M,D
                          05891
 1530 E1             [10] 05892         pop     H               ; Refresh address of block above block to free
                          05893
 1531 EB              [4] 05894         xchg                    ; Point to block to free
                          05895
 1532 73              [7] 05896         mov     M,E             ; Store next
 1533 23              [5] 05897         inx     H
 1534 72              [7] 05898         mov     M,D
                          05899
 1535 EB              [4] 05900         xchg                    ; Retrieve block size
 1536 2A 00F0        [16] 05901         lhld    Vq00F0
 1539 EB              [4] 05902         xchg
                          05903
 153A 23              [5] 05904         inx     H               ; Store size
 153B 73              [7] 05905         mov     M,E
 153C 23              [5] 05906         inx     H
 153D 72              [7] 05907         mov     M,D
                          05908
 153E 60              [5] 05909         mov     H,B             ; Point to previous block
 153F 69              [5] 05910         mov     L,C
                          05911
 1540 CD 154A        [17] 05912         call    Lq1DD9          ; Try to coalesce with block being freed
 1543 CA 154A        [10] 05913         jz      Lq1DD9          ; Combined?
                          05914
 1546 7E              [7] 05915         mov     A,M             ; Point to next block
 1547 23              [5] 05916         inx     H
 1548 66              [7] 05917         mov     H,M
 1549 6F              [5] 05918         mov     L,A
                          05919
 154A                     05920 Lq1DD9:
 154A 23              [5] 05921         inx     H               ; Set up to try combining with the next block
 154B 23              [5] 05922         inx     H
 154C 23              [5] 05923         inx     H
 154D 46              [7] 05924         mov     B,M             ; Load size
 154E 2B              [5] 05925         dcx     H
 154F 4E              [7] 05926         mov     C,M
 1550 2B              [5] 05927         dcx     H               ; Load next
 1551 56              [7] 05928         mov     D,M
 1552 2B              [5] 05929         dcx     H
 1553 5E              [7] 05930         mov     E,M
                          05931
 1554                     05932 Lq1E05:
 1554 7D              [5] 05933         mov     A,L             ; Are the blocks adjacent?
 1555 81              [4] 05934         add     C
 1556 BB              [4] 05935         cmp     E
 1557 C0           [5/11] 05936         rnz                     ; No - low bytes do not match
 1558 7D              [5] 05937         mov     A,L
 1559 81              [4] 05938         add     C
 155A 7C              [5] 05939         mov     A,H
 155B 88              [4] 05940         adc     B
 155C BA              [4] 05941         cmp     D
 155D C0           [5/11] 05942         rnz                     ; No - high bytes do not match
                          05943
 155E E5             [11] 05944         push    H               ; Stash address of lower block
                          05945
 155F 2A 00C4        [16] 05946         lhld    Vq00C4          ; Is upper block at the top of the heap?
 1562 7D              [5] 05947         mov     A,L
 1563 BB              [4] 05948         cmp     E
 1564 C2 156C        [10] 05949         jnz     Lq1E15          ; No
 1567 7C              [5] 05950         mov     A,H
 1568 BA              [4] 05951         cmp     D
 1569 CA 1582        [10] 05952         jz      Lq1E30          ; Yes
                          05953
 156C                     05954 Lq1E15:
 156C E1             [10] 05955         pop     H               ; Recover address of lower block
                          05956
 156D 1A              [7] 05957         ldax    D               ; Transfer next to lower block
 156E 77              [7] 05958         mov     M,A
 156F 23              [5] 05959         inx     H
 1570 13              [5] 05960         inx     D
 1571 1A              [7] 05961         ldax    D
 1572 77              [7] 05962         mov     M,A
 1573 23              [5] 05963         inx     H
 1574 13              [5] 05964         inx     D
                          05965
 1575 1A              [7] 05966         ldax    D               ; Add the sizes
 1576 81              [4] 05967         add     C
 1577 77              [7] 05968         mov     M,A
 1578 13              [5] 05969         inx     D
 1579 23              [5] 05970         inx     H
 157A 1A              [7] 05971         ldax    D
 157B 88              [4] 05972         adc     B
 157C 77              [7] 05973         mov     M,A
                          05974
 157D 2B              [5] 05975         dcx     H               ; Point back to the base
 157E 2B              [5] 05976         dcx     H
 157F 2B              [5] 05977         dcx     H
                          05978
 1580 AF              [4] 05979         xra     A               ; Clear Z flag
                          05980
 1581 C9             [10] 05981         ret
                          05982
 1582                     05983 Lq1E30:
 1582 E1             [10] 05984         pop     H               ; Recover address of lower block
                          05985
 1583 22 00C4        [16] 05986         shld    Vq00C4          ; Set new top of heap
                          05987
 1586 06 04           [7] 05988         mvi     B,4             ; Zero the header
                          05989
 1588                     05990 Lq1E38:
 1588 36 00          [10] 05991         mvi     M,0
 158A 23              [5] 05992         inx     H
 158B 05              [5] 05993         dcr     B
 158C C2 1588        [10] 05994         jnz     Lq1E38
                          05995
 158F C9             [10] 05996         ret

Bill · August 12, 2021, 3:55am

I have been putting some development time into FLEX again. The first priority is rearranging the several components of the system to arrive at a set of configurations which work on the largest number of dissimilar 6502 systems.

The second is changing the API slightly to try to reduce the reloading of registers. The initial prototype expected addresses to be passed in registers A:X because that was how I kept a sixteen-bit number while doing calculations. By changing the calling convention to Y:X, I am saving up to about half of the register reload instructions.

Now I have a couple of questions:

How useful is it to not corrupt memory used by a program if the operating system is rebooted? Some 680x programs provided a warm restart entry point at $103; this may be useful to try to save data after a crash before a total clean restart. Is it important at all to preserve the contents of the Utility Command Space under the same scenario?

I have always thought that the “current drive” concept in CP/M and DOS was a much more convenient way to specify the “working” drive than using the ASN (assign) utility in FLEX.

Now that I have a platform for experimentation, I have prototyped a current drive mode for FLEX. Instead of the usual “+++” command prompt, it uses “0++” when drive 0 is the current drive and operating in current drive mode. The “1.” command makes drive 1 the current drive.

The ASN utility will be modified to switch between the classic and current drive modes in addition to assigning drives in classic mode.

So is this a good idea or am I off in the weeds?

Bill · February 10, 2022, 3:01am

I just discovered a bug lurking in my processor simulators for many years.

The rotate and shift instructions of some processors like the 6800 and 6502 set the zero and sign flags based on the value of the result. My code for most of them had been propagating the values of the native x86 processor flags to those of the simulated CPU. The problem is that the shift and rotate instructions of the x86 does not affect the zero and sign flags - a small detail I had overlooked.

That this is not an often used feature is why it went this long.

Correction: The rotate instructions of the x86 do not affect the sign and zero flags. The shift instructions do. That explains how I managed to miss this.

Bill · February 18, 2022, 12:58am

I got big binary number to ASCII decimal conversion working with the Double Dabble algorithm.

Instead of over 22 billion machine cycles with the divide by ten for each digit method, the Sevens problem completes in a little over 103.2 million cycles. That is not unlike going from a bicycle to an airliner.

There is still some room for optimization. Namely the fact that numbers up to 100 bytes in size do not result in a string over 255 characters long, so calculations can be done using single-byte values. I’m thinking maybe up to a doubling of the speed.

Bill · February 18, 2022, 1:57pm

number = 1
power=0
string=""

while string.find("777777") < 0:

        number = number * 7
        power = power + 1
        string = str(number)

print('Seven to the power of', power, 'is', string)

The original divide by 10 for each digit method took over 22 billion cycles.

The first double-byte double dabble implementation took $626D28A 103207562 cycles.

The implementation of a single-byte double dabble mode for numbers 100 bytes or less took $59D709A 94204058 cycles. This was more than I was expecting.

After some more optimization, it dropped to $59D47D6 94193622 cycles.

What this is trying to say is that the decimal conversion is no longer a large majority of the time. To see how this broke down:

Omitting the string search took $5974C0C 93801484 cycles.

number = 1
power=0
string=""

while power < 176:

	number = number * 7
	power = power + 1
	string = str(number)

Omitting the string search and the decimal conversion took $3F55E03 66412035 cycles.

number = 1
power=0
string=""

while power < 176:

	number = number * 7
	power = power + 1

What this means is that the multiplication code needs another look. I know it can be improved.

Bill · February 24, 2022, 6:32am

Some time ago, I posted implementations of binary to decimal conversion using the Double Dabble algorithm for several microprocessors:

Rudla posted code to convert a binary number to packed BCD:

http://forum.6502.org/viewtopic.php?f=2&t=6579

I recognized that his code bore a definite resemblance to Double Dabble, but it was clearly not the same. Studying his implementation allowed me to step back and see the forest from the trees of the details of the Double Dabble algorithm. Improvements from a straight interpretation of Double Dabble include:

all of that testing of and adding to individual BCD digits is merely a way to perform a “decimal adjustment” when the hardware does not directly support it. Hardware supporting decimal adjustment can combine the shifting and fiddling by adding to itself (doubling) with the decimal arithmetic mode built into the 6502 or a DAA decimal adjust instruction following the addition on other processors.
shifting only one byte of the number being converted at a time instead of shifting the entire number. The savings become much more significant for larger numbers.
growing the result as digits are added instead of needlessly shifting an ocean of zeroes in a large pre-cleared buffer.

I improveed on Rudla’s implementation by starting with no bytes of result instead of one byte containing zero, saving up to seven shifts of the byte. I also did a minor tweak improving the innermost loop.

Using what I learned, I rewrote my conversion code from scratch, making it faster and simpler in the process.

My design is based on maximum reuse of memory and is intended to produce a printable character string instead of just a packed BCD number. Because the maximum number to be converted is eight bytes, the buffer is placed in the zero-page for maximum efficiency. The number to be converted is copied to the low end of the output buffer with the least-significant byte at the lowest address. The converted BCD digits grow downward from the end of the buffer and do not overlap the number until after its bytes are already consumed by the shifting. Because the largest number to be converted is 64 bits, it was reasonable to place the buffer in the zero page.

While I will not claim that these are the fastest routines to convert 32 and 64 bit integers to ASCII decimal strings, I am confident they are among the best.

Did you know that there is a 32000 character limit for a post? I know now…

Bill · February 24, 2022, 6:33am

The 6502 version:

.                         00191 ; Input:
.                         00192 ;       The absolute value of the number to convert is stored
.                         00193 ;         starting at OutBuf+2 in little-endian form
.                         00194 ;       A = number of bytes in the number
.                         00195 ;       Dabble is the label of last byte of the buffer
.                         00196 ;       OutBuf+0 = 0 if number is negative, other if negative
.                         00197 ;
.                         00198 ; Output:
.                         00199 ;       A:X contains the address of the resultant string
.                         00200 ;
.                         00201 ; Uses:
.                         00202 ;       Byt0 = number of bytes to convert
.                         00203 ;       Byt1 = number of packed BCD bytes in result
.                         00204 ;       Byt2 = bits of number being shifted

.206E                     00225 Format:
.206E 85 12           [3] 00226          sta    Byt0      ; Remember number of bytes to convert
.2070 AA              [2] 00227          tax              ; Use it as index into the number
.                         00228
.2071 A9 00           [2] 00229          lda    #0        ; Start with no bytes of BCD digits
.2073 85 13           [3] 00230          sta    Byt1
.                         00231
.2075 F8              [2] 00232          sed              ; Switch to decimal mode
.                         00233
.2076                     00234 SkipLeadingZero:
.2076 B5 19           [4] 00235          lda    OutBuf+1,X  ; Get next highest byte of the number
.2078 D0 07 (2081)  [2/3] 00236          bne    NotLeadingZero
.                         00237
.207A CA              [2] 00238          dex              ; Skip a leading zero
.                         00239
.207B D0 F9 (2076)  [2/3] 00240          bne    SkipLeadingZero ; Check the next one if more
.                         00241
.207D F0 2F (20AE)  [2/3] 00242          beq    BCD2ASCII ; All bytes are zero
.                         00243
.207F                     00244 ByteLoop:
.207F B5 19           [4] 00245          lda    OutBuf+1,X  ; Get next highest byte of the number
.                         00246
.2081                     00247 NotLeadingZero:
.2081 85 14           [3] 00248          sta    Byt2      ; Set it for shifting
.                         00249
.2083 86 12           [3] 00250          stx    Byt0      ; Remember the address of the last one shifted
.                         00251
.2085 38              [2] 00252          sec              ; Shift in extra bit for eight trips through loop
.                         00253
.2086 B0 12 (209A)  [2/3] 00254          bcs    EnterBitLoop ; Always branches
.                         00255
.2088                     00256 BitLoop:
.2088 A2 2D           [2] 00257          ldx    #Dabble   ; Point to the BCD digits
.                         00258
.208A A4 13           [3] 00259          ldy    Byt1      ; Load count of number of digit bytes
.208C F0 0A (2098)  [2/3] 00260          beq    SkipDigitLoop ; Skip loop if none
.                         00261
.208E                     00262 DigitLoop:
.208E B5 00           [4] 00263          lda    0,X       ; Double pair of BCD digits with carry
.2090 75 00           [4] 00264          adc    0,X
.2092 95 00           [4] 00265          sta    0,X
.                         00266
.2094 CA              [2] 00267          dex              ; Point to next pair of BCD digits
.                         00268
.2095                     00269 IntoDigitLoop:
.2095 88              [2] 00270          dey              ; More BCD digits?
.2096 D0 F6 (208E)  [2/3] 00271          bne    DigitLoop ; Yes
.                         00272
.2098                     00273 SkipDigitLoop:
.2098 B0 0B (20A5)  [2/3] 00274          bcs    Extend    ; Carry out of accumulated digits?
.                         00275
.209A                     00276 EnterBitLoop:
.209A 26 14           [5] 00277          rol    Byt2      ; Transfer upper bit to carry flag
.209C D0 EA (2088)  [2/3] 00278          bne    BitLoop   ; Repeat if more
.                         00279
.209E A6 12           [3] 00280          ldx    Byt0      ; Point back to the number to convert
.                         00281
.20A0 CA              [2] 00282          dex              ; Any more bytes of the number to convert?
.20A1 D0 DC (207F)  [2/3] 00283          bne    ByteLoop  ; Repeat if more
.                         00284
.20A3 F0 09 (20AE)  [2/3] 00285          beq    BCD2ASCII ; Always branches to unpacking stage
.                         00286
.20A5                     00287 Extend:
.20A5 E6 13           [5] 00288          inc    Byt1      ; Add a new byte to the result
.20A7 A9 01           [2] 00289          lda    #1
.20A9 95 00           [4] 00290          sta    0,X
.                         00291
.20AB 18              [2] 00292          clc
.20AC 90 EC (209A)  [2/3] 00293          bcc    EnterBitLoop ; Always branches
.                         00294
.20AE                     00295 BCD2ASCII:
.20AE D8              [2] 00296          cld              ; Return to binary mode
.                         00297
.20AF A5 13           [3] 00298          lda    Byt1      ; Any converted digits?
.20B1 F0 42 (20F5)  [2/3] 00299          beq    NoDigits  ; No
.                         00300
.20B3 38              [2] 00301          sec
.20B4 A9 2E           [2] 00302          lda    #Dabble+1  ; Point to most significant BCD pair
.20B6 E5 13           [3] 00303          sbc    Byt1
.20B8 A8              [2] 00304          tay
.                         00305
.20B9 A2 1A           [2] 00306          ldx    #<OutBuf+2  ; Point to result buffer
.                         00307
.20BB B9 0000       [4/5] 00308          lda    0,Y       ; Get most significant BCD pair
.                         00309
.20BE 29 F0           [2] 00310          and    #$F0      ; Is the upper one zero?
.20C0 F0 0E (20D0)  [2/3] 00311          beq    UpperZero
.20C2 D0 03 (20C7)  [2/3] 00312          bne    ConvertBoth
.                         00313
.20C4                     00314 ASCIILoop:
.20C4 B9 0000       [4/5] 00315          lda    0,Y       ; Get next most significant BCD pair
.                         00316
.20C7                     00317 ConvertBoth:
.20C7 4A              [2] 00318          lsr    A         ; Convert upper one to ASCII
.20C8 4A              [2] 00319          lsr    A
.20C9 4A              [2] 00320          lsr    A
.20CA 4A              [2] 00321          lsr    A
.20CB 09 30           [2] 00322          ora    #'0'
.20CD 95 00           [4] 00323          sta    0,X
.20CF E8              [2] 00324          inx
.                         00325
.20D0                     00326 UpperZero:
.20D0 B9 0000       [4/5] 00327          lda    0,Y
.20D3 C8              [2] 00328          iny
.20D4 29 0F           [2] 00329          and    #$F       ; And then the lower one
.20D6 09 30           [2] 00330          ora    #'0'
.20D8 95 00           [4] 00331          sta    0,X
.20DA E8              [2] 00332          inx
.                         00333
.20DB C6 13           [5] 00334          dec    Byt1      ; More digits?
.20DD D0 E5 (20C4)  [2/3] 00335          bne    ASCIILoop ; Yes
.                         00336
.20DF 8A              [2] 00337          txa              ; Store string length
.20E0 38              [2] 00338          sec
.20E1 E9 1A           [2] 00339          sbc    #OutBuf+2
.                         00340
.20E3 A6 18           [3] 00341          ldx    OutBuf    ; Is the number negative?
.20E5 F0 14 (20FB)  [2/3] 00342          beq    NumberNotNegative
.                         00343
.20E7 18              [2] 00344          clc              ; Increment and store length
.20E8 69 01           [2] 00345          adc    #1
.20EA 85 18           [3] 00346          sta    OutBuf
.                         00347
.20EC A9 2D           [2] 00348          lda    #'-'      ; Prepend minus sign
.20EE 85 19           [3] 00349          sta    OutBuf+1
.                         00350
.20F0 A9 00           [2] 00351          lda    #>OutBuf
.20F2 A2 18           [2] 00352          ldx    #<OutBuf
.                         00353
.20F4 60              [6] 00354          rts
.                         00355
.20F5                     00356 NoDigits:
.20F5 A9 30           [2] 00357          lda    #'0'      ; Emit single '0'
.20F7 85 1A           [3] 00358          sta    OutBuf+2
.                         00359
.20F9 A9 01           [2] 00360          lda    #1        ; String length one
.                         00361
.20FB                     00362 NumberNotNegative:
.20FB 85 19           [3] 00363          sta    OutBuf+1
.                         00364
.20FD A9 00           [2] 00365          lda    #>OutBuf+1
.20FF A2 19           [2] 00366          ldx    #<OutBuf+1
.                         00367
.2101 60              [6] 00368          rts

The scorecard:

                          00001 ; Converting $FFFFFFFF
                          00002 ; old $2BC9 11209 cycles for call FormatLongword
                          00003 ; new $0A72 2674 cycles
                          00004 ;
                          00005 ; Converting $00000001
                          00006 ; old $2A96 10902 cycles for call FormatLongword
                          00007 ; new $015B 347 cycles
                          00008 ;
                          00009 ; Converting $00000000
                          00010 ; old $2A98 10904 cycles for call FormatLongword
                          00011 ; new $0075 117 cycles

The addressing modes and the three byte-sized registers worked well for this implementation.

The 6800 version:

.                         00486 * Input:
.                         00487 *       The absolute value of the number to convert is stored
.                         00488 *         starting at OutBuf+2 in little-endian form
.                         00489 *       A = number of bytes in the number
.                         00490 *       Dabble is the label of last byte of the buffer
.                         00491 *       OutBuf+0 = 0 if number is negative, other if negative
.                         00492 *
.                         00493 * Output:
.                         00494 *       X contains the address of the resultant string
.                         00495 *
.                         00496 * Uses:
.                         00497 *       Byt0 = number of bytes to convert
.                         00498 *       Byt1 = number of packed BCD bytes in result
.                         00499 *       Byt2 = bits of number being shifted
.                         00500 *       Ptr0
.                         00501 *       Ptr1

.01F5                     00522 Format
.01F5 97 10           [4] 00523          staa   Byt0      ; Remember number of bytes to convert
.                         00524
.01F7 8B 17           [2] 00525          adda   #<OutBuf+1 ; Point to most significant byte of number
.01F9 97 0B           [4] 00526          staa   Ptr0+1
.01FB 86 00           [2] 00527          ldaa   #>OutBuf+1
.01FD 89 00           [2] 00528          adca   #0
.01FF 97 0A           [4] 00529          staa   Ptr0
.0201 DE 0A           [4] 00530          ldx    Ptr0
.                         00531
.0203 7F 0011         [6] 00532          clr    Byt1      ; Start with no bytes of BCD digits
.                         00533
.0206                     00534 SkipLeadingZero
.0206 E6 00           [5] 00535          ldab   ,X        ; Get next highest byte of the number
.0208 26 0D (0217)    [4] 00536          bne    NotLeadingZero
.                         00537
.020A 09              [4] 00538          dex              ; Skip a leading zero
.                         00539
.020B 7A 0010         [6] 00540          dec    Byt0
.020E 26 F6 (0206)    [4] 00541          bne    SkipLeadingZero ; Check the next one if more
.                         00542
.0210 20 32 (0244)    [4] 00543          bra    BCD2ASCII ; All bytes are zero
.                         00544
.0212                     00545 ByteLoop
.0212 DE 0A           [4] 00546          ldx    Ptr0      ; Point back to the number to convert
.0214 09              [4] 00547          dex              ; Point to next highest byte
.                         00548
.0215 E6 00           [5] 00549          ldab   ,X        ; Get next highest byte of the number
.                         00550
.0217                     00551 NotLeadingZero
.0217 D7 12           [4] 00552          stab   Byt2      ; Set it for shifting
.                         00553
.0219 DF 0A           [5] 00554          stx    Ptr0      ; Remember the address of the last one shifted
.                         00555
.021B 0D              [2] 00556          sec              ; Shift in extra bit for eight trips through loop
.                         00557
.021C 20 1C (023A)    [4] 00558          bra    EnterBitLoop
.                         00559
.021E                     00560 BitLoop
.021E CE 002B         [3] 00561          ldx    #Dabble   ; Point to the BCD digits
.                         00562
.0221 D6 11           [3] 00563          ldab   Byt1      ; Load count of number of digit bytes
.0223 27 0B (0230)    [4] 00564          beq    SkipDigitLoop ; Skip loop if none
.                         00565
.0225                     00566 DigitLoop
.0225 A6 00           [5] 00567          ldaa   ,X        ; Double pair of BCD digits with carry
.0227 A9 00           [5] 00568          adca   ,X
.0229 19              [2] 00569          daa
.022A A7 00           [6] 00570          staa   ,X
.                         00571
.022C 09              [4] 00572          dex              ; Point to next pair of BCD digits
.                         00573
.022D                     00574 IntoDigitLoop
.022D 5A              [2] 00575          decb             ; More BCD digits?
.022E 26 F5 (0225)    [4] 00576          bne    DigitLoop ; Yes
.                         00577
.0230                     00578 SkipDigitLoop
.0230 24 08 (023A)    [4] 00579          bcc    EnterBitLoop ; Carry out of accumulated digits?
.                         00580
.0232 7C 0011         [6] 00581          inc    Byt1      ; Add a new byte to the result
.0235 86 01           [2] 00582          ldaa   #1
.0237 A7 00           [6] 00583          staa   ,X
.                         00584
.0239 0C              [2] 00585          clc
.                         00586
.023A                     00587 EnterBitLoop
.023A 79 0012         [6] 00588          rol    Byt2      ; Transfer upper bit to carry flag
.023D 26 DF (021E)    [4] 00589          bne    BitLoop   ; Repeat if more
.                         00590
.023F 7A 0010         [6] 00591          dec    Byt0      ; Any more bytes of the number to convert?
.0242 26 CE (0212)    [4] 00592          bne    ByteLoop  ; Repeat if more
.                         00593
.0244                     00594 BCD2ASCII
.0244 96 11           [3] 00595          ldaa   Byt1      ; Any converted digits?
.0246 27 57 (029F)    [4] 00596          beq    NoDigits  ; No
.                         00597
.0248 86 2C           [2] 00598          ldaa   #<Dabble+1 ; Point to most significant BCD pair
.024A 90 11           [3] 00599          suba   Byt1
.024C 97 0B           [4] 00600          staa   Ptr0+1
.024E 86 00           [2] 00601          ldaa   #>Dabble+1
.0250 82 00           [2] 00602          sbca   #0
.0252 97 0A           [4] 00603          staa   Ptr0
.                         00604
.0254 CE 0018         [3] 00605          ldx    #OutBuf+2 ; Point to result buffer
.0257 DF 0C           [5] 00606          stx    Ptr1
.                         00607
.0259 DE 0A           [4] 00608          ldx    Ptr0      ; Get most significant BCD pair
.025B A6 00           [5] 00609          ldaa   ,X
.025D 08              [4] 00610          inx
.025E DF 0A           [5] 00611          stx    Ptr0
.0260 DE 0C           [4] 00612          ldx    Ptr1
.                         00613
.0262 16              [2] 00614          tab              ; Clone it
.                         00615
.0263 85 F0           [2] 00616          bita   #$F0      ; Is the upper one zero?
.0265 27 17 (027E)    [4] 00617          beq    ConvertLowerOnly
.0267 20 0C (0275)    [4] 00618          bra    ConvertBoth
.                         00619
.0269                     00620 ASCIILoop
.0269 DF 0C           [5] 00621          stx    Ptr1
.026B DE 0A           [4] 00622          ldx    Ptr0      ; Get next most significant BCD pair
.026D A6 00           [5] 00623          ldaa   ,X
.026F 08              [4] 00624          inx
.0270 DF 0A           [5] 00625          stx    Ptr0
.0272 DE 0C           [4] 00626          ldx    Ptr1
.                         00627
.0274 16              [2] 00628          tab              ; Clone it
.                         00629
.0275                     00630 ConvertBoth
.0275 44              [2] 00631          lsra             ; Convert upper one to ASCII
.0276 44              [2] 00632          lsra
.0277 44              [2] 00633          lsra
.0278 44              [2] 00634          lsra
.0279 8A 30           [2] 00635          oraa   #'0'
.027B A7 00           [6] 00636          staa   ,X
.027D 08              [4] 00637          inx
.                         00638
.027E                     00639 ConvertLowerOnly
.027E C4 0F           [2] 00640          andb   #$F       ; And then the lower one
.0280 CA 30           [2] 00641          orab   #'0'
.0282 E7 00           [6] 00642          stab   ,X
.0284 08              [4] 00643          inx
.                         00644
.0285 7A 0011         [6] 00645          dec    Byt1      ; More BCD pairs?
.0288 26 DF (0269)    [4] 00646          bne    ASCIILoop
.                         00647
.028A DF 0C           [5] 00648          stx    Ptr1      ; Determine string length
.028C 96 0D           [3] 00649          ldaa   Ptr1+1
.028E 80 18           [2] 00650          suba   #<OutBuf+2
.                         00651
.0290 D6 16           [3] 00652          ldab   OutBuf    ; Is the number negative?
.0292 27 11 (02A5)    [4] 00653          beq    NumberNotNegative
.                         00654
.0294 4C              [2] 00655          inca             ; Increment and store length byte
.0295 97 16           [4] 00656          staa   OutBuf
.                         00657
.0297 C6 2D           [2] 00658          ldab   #'-'      ; Prepend minus sign
.0299 D7 17           [4] 00659          stab   OutBuf+1
.                         00660
.029B CE 0016         [3] 00661          ldx    #OutBuf   ; Point to the result
.                         00662
.029E 39              [5] 00663          rts
.                         00664
.029F                     00665 NoDigits
.029F 86 30           [2] 00666          ldaa   #'0'      ; Emit single '0'
.02A1 97 18           [4] 00667          staa   OutBuf+2
.                         00668
.02A3 86 01           [2] 00669          ldaa   #1        ; String length one
.                         00670
.02A5                     00671 NumberNotNegative
.02A5 97 17           [4] 00672          staa   OutBuf+1
.                         00673
.02A7 CE 0017         [3] 00674          ldx    #OutBuf+1
.                         00675
.02AA 39              [5] 00676          rts

The scorecard:

                          00001 * Converting $FFFFFFFF
                          00002 * old $38F0 14576 cycles for call FormatLongword
                          00003 * new $0FED 4077 cycles
                          00004 *
                          00005 * Converting $00000001
                          00006 * old $37B3 14259 cycles for call FormatLongword
                          00007 * new $01F2 498 cycles
                          00008 *
                          00009 * Converting $00000000
                          00010 * old $37B0 14256 cycles for call FormatLongword
                          00011 * new $00B9 185 cycles

The 6800 did not fare so well with only one index register and relatively slow memory access instructions.

The added capabilities of the 6809 do not appear to be much better.

The 8080 version:

.                         00127 ; Input:
.                         00128 ;       The absolute value of the number to convert is stored
.                         00129 ;         starting at OutBuf+2 in little-endian form
.                         00130 ;       A = number of bytes in the number
.                         00131 ;       B = 0 if number is positive, other if negative
.                         00132 ;       Dabble is the label of last byte of the buffer
.                         00133 ;
.                         00134 ; Output:
.                         00135 ;       HL contains the address of the resultant string

.0173                     00156 Format:
.0173 4F              [5] 00157         mov     C,A             ; Remember number of bytes to convert
.0174 5F              [5] 00158         mov     E,A
.                         00159
.0175 78              [5] 00160         mov     A,B             ; Remember negativity
.0176 32 0333        [13] 00161         sta     OutBuf
.                         00162
.0179 AF              [4] 00163         xra     A               ; Start with no bytes of BCD digits
.017A 47              [5] 00164         mov     B,A
.                         00165
.017B 21 0334        [10] 00166         lxi     H,OutBuf+1      ; Point DE to most significant byte of number
.017E 57              [5] 00167         mov     D,A
.017F 19             [10] 00168         dad     D
.0180 EB              [4] 00169         xchg
.                         00170
.0181                     00171 SkipLeadingZero:
.0181 1A              [7] 00172         ldax    D               ; Get next highest byte of the number
.0182 B7              [4] 00173         ora     A
.0183 C2 018F        [10] 00174         jnz     NotLeadingZero
.                         00175
.0186 1B              [5] 00176         dcx     D               ; Skip a leading zero
.                         00177
.0187 0D              [5] 00178         dcr     C
.0188 C2 0181        [10] 00179         jnz     SkipLeadingZero ; Check the next one if more
.                         00180
.018B C3 01BC        [10] 00181         jmp     BIN2ASCII       ; All bytes are zero
.                         00182
.018E                     00183 ByteLoop:
.018E 1A              [7] 00184         ldax    D               ; Get next highest byte of the number
.                         00185
.018F                     00186 NotLeadingZero:
.018F 1B              [5] 00187         dcx     D               ; Point to next lower byte
.                         00188
.0190 D5             [11] 00189         push    D               ; Stash pointer to next byte
.                         00190
.0191 5F              [5] 00191         mov     E,A             ; Set it for shifting
.                         00192
.0192 37              [4] 00193         stc                     ; Shift in extra bit for eight times thru loop
.                         00194
.0193 C3 01AF        [10] 00195         jmp     EnterBitLoop
.                         00196
.0196                     00197 BitLoop:
.0196 21 0348        [10] 00198         lxi     H,Dabble        ; Point to the BCD digits
.                         00199
.0199 04              [5] 00200         inr     B               ; Check count of number of digit bytes
.019A 05              [5] 00201         dcr     B               ; Test for zero but preserve carry flag
.019B CA 01A8        [10] 00202         jz      SkipDigitLoop   ; Skip loop if none
.                         00203
.019E 50              [5] 00204         mov     D,B
.                         00205
.019F                     00206 DigitLoop:
.019F 7E              [7] 00207         mov     A,M             ; Double pair of BCD digits with carry
.01A0 8E              [7] 00208         adc     M
.01A1 27              [4] 00209         daa
.01A2 77              [7] 00210         mov     M,A
.                         00211
.01A3 2B              [5] 00212         dcx     H               ; Point to next pair of BCD digits
.                         00213
.01A4                     00214 IntoDigitLoop:
.01A4 15              [5] 00215         dcr     D               ; More BCD digits?
.01A5 C2 019F        [10] 00216         jnz     DigitLoop       ; Yes
.                         00217
.01A8                     00218 SkipDigitLoop:
.01A8 D2 01AF        [10] 00219         jnc     EnterBitLoop    ; Carry out of accumulated digits?
.                         00220
.01AB 04              [5] 00221         inr     B               ; Add new byte to the result
.01AC 36 01          [10] 00222         mvi     M,1
.                         00223
.01AE B7              [4] 00224         ora     A               ; Clear carry flag
.                         00225
.01AF                     00226 EnterBitLoop:
.01AF 7B              [5] 00227         mov     A,E
.01B0 17              [4] 00228         ral                     ; Transfer upper bit to carry flag
.01B1 5F              [5] 00229         mov     E,A
.                         00230
.01B2 1C              [5] 00231         inr     E
.01B3 1D              [5] 00232         dcr     E
.01B4 C2 0196        [10] 00233         jnz     BitLoop         ; Repeat if more
.                         00234
.01B7 D1             [10] 00235         pop     D               ; Recover pointer to next byte
.                         00236
.01B8 0D              [5] 00237         dcr     C               ; Any more bytes of the number to convert?
.01B9 C2 018E        [10] 00238         jnz     ByteLoop        ; Repeat if more
.                         00239
.01BC                     00240 BIN2ASCII:
.01BC 21 0335        [10] 00241         lxi     H,OutBuf+2      ; Point to result buffer
.                         00242
.01BF 78              [5] 00243         mov     A,B             ; Any converted digits?
.01C0 B7              [4] 00244         ora     A
.01C1 CA 01F4        [10] 00245         jz      NoDigits        ; No
.                         00246
.01C4 11 0349        [10] 00247         lxi     D,Dabble+1      ; Point to most significant BCD pair
.01C7 7B              [5] 00248         mov     A,E
.01C8 90              [4] 00249         sub     B
.01C9 5F              [5] 00250         mov     E,A
.01CA 7A              [5] 00251         mov     A,D
.01CB DE 00           [7] 00252         sbi     0
.01CD 57              [5] 00253         mov     D,A
.                         00254
.01CE 1A              [7] 00255         ldax    D               ; Get most significant BCD pair
.                         00256
.01CF E6 F0           [7] 00257         ani     0F0h            ; Is the upper one zero?
.01D1 CA 01DF        [10] 00258         jz      ConvertLowerOnly
.                         00259
.01D4                     00260 ASCIILoop:
.01D4 1A              [7] 00261         ldax    D               ; Get next most significant BCD pair
.01D5 0F              [4] 00262         rrc                     ; Convert upper one to ASCII
.01D6 0F              [4] 00263         rrc
.01D7 0F              [4] 00264         rrc
.01D8 0F              [4] 00265         rrc
.01D9 E6 0F           [7] 00266         ani     0Fh
.01DB F6 30           [7] 00267         ori     '0'
.01DD 77              [7] 00268         mov     M,A
.01DE 23              [5] 00269         inx     H
.                         00270
.01DF                     00271 ConvertLowerOnly:
.01DF 1A              [7] 00272         ldax    D               ; And then the lower one
.01E0 13              [5] 00273         inx     D
.01E1 E6 0F           [7] 00274         ani     0Fh
.01E3 F6 30           [7] 00275         ori     '0'
.01E5 77              [7] 00276         mov     M,A
.01E6 23              [5] 00277         inx     H
.                         00278
.01E7 05              [5] 00279         dcr     B
.01E8 C2 01D4        [10] 00280         jnz     ASCIILoop
.                         00281
.01EB 7D              [5] 00282         mov     A,L             ; Store string length
.01EC DE 35           [7] 00283         sbi     OutBuf+2 and 0FFh
.01EE 32 0334        [13] 00284         sta     OutBuf+1
.                         00285
.01F1 C3 01F9        [10] 00286         jmp     FinishFormat
.                         00287
.01F4                     00288 NoDigits:
.01F4 36 30          [10] 00289         mvi     M,'0'           ; Emit single '0'
.                         00290
.01F6 2B              [5] 00291         dcx     H               ; String length one
.01F7 36 01          [10] 00292         mvi     M,1
.                         00293
.01F9                     00294 FinishFormat:
.01F9 21 0334        [10] 00295         lxi     H,OutBuf+1      ; Point to the result
.                         00296
.01FC 3A 0333        [13] 00297         lda     OutBuf          ; Was number negative?
.01FF B7              [4] 00298         ora     A
.0200 CA 0209        [10] 00299         jz      NumberNotNegative       ; No
.                         00300
.0203 7E              [7] 00301         mov     A,M             ; Get and increment length byte
.0204 3C              [5] 00302         inr     A
.                         00303
.0205 36 2D          [10] 00304         mvi     M,'-'           ; Prepend minus sign
.                         00305
.0207 2B              [5] 00306         dcx     H               ; Point to the new result
.                         00307
.0208 77              [7] 00308         mov     M,A             ; Store new length
.                         00309
.0209                     00310 NumberNotNegative:
.0209 C9             [10] 00311         ret

The scorecard:

                          00001 ; Converting $FFFFFFFF
                          00002 ; old $7F53 32595 cycles for call FormatLongword
                          00003 ; new $2359 9049 cycles
                          00004 ; new $1F18 7960 cycles
                          00005 ;
                          00006 ; Converting $00000001
                          00007 ; old $7CEE 31982 cycles for call FormatLongword
                          00008 ; new $055A 1370 cycles
                          00009 ; new $04C5 1221 cycles
                          00010 ;
                          00011 ; Converting $00000000
                          00012 ; old $7D05 32005 cycles for call FormatLongword
                          00013 ; new $01E0 480 cycles
                          00014 ; new $01A2 418 cycles

Two sets of new numbers are reported. One for the initial version and another for after some intense code golf managed to eliminate variables in memory.

I actually wrote the 8080 version first, then moved on to the 6800 and then the 6502. Only then did I invest the effort to optimize this one.

Bill · March 29, 2022, 12:32pm

I came across some benchmarks to compare Millfork with other compilers:

The linked list one caught my attention. It creates a list of 3000 nodes then traverses it to add up the values.

My implementions are somewhat like the one provided in C except that I do not use his timing mechanism and the list creation loop counts down instead of up. My intent is not to compare with the compilers but between several different processors.

github.com

KarolS/millfork-benchmarks/blob/master/6502/linkedlist.c

#include "benchcommon.h"

struct node {
    struct node* next;
    int value;
};

struct node heap [4000];

unsigned int free_;
struct node* root;

void init(void) {
    free_ = 0;
    root = 0;
}

struct node* alloc() {
    struct node* result;
    result = heap + free_;

This file has been truncated. show original

I am slowly relenting to the prospect that a loop ending with an unconditional branch back to the top often performs better by transforming it to enter in the middle. The result looks unstructured, but it is not very different.

The first one is for the 6800. I have found that it is easiest to adopt 6800 code to other processors instead of the other way around.

The dual accumulators and 16-bit index register with displacement is especially well suited for this challenge.

Though inlining the subroutines provided a substantial improvement, I am showing the code before that final transformation for clarity.

                          00001 * 378834 ($5C7D2) cycles
                          00002 * 300821 ($49715) cycles after inlining subroutines
                          00003
 0000 005F                00004 Free     fdb    Heap
 0002 0000                00005 Root     fdb    0
 0004 0000                00006 New      fdb    0
 0006 0000 0000           00007 Total    fdb    0,0
 000A 0BB7                00008 I        fdb    2999
                          00009
                          00010 *
                          00011 * Allocate a node
                          00012 *
 000C                     00013 Alloc
 000C D6 01           [3] 00014          ldab   Free+1    ; Allocate a node
 000E D7 05           [4] 00015          stab   New+1     ; Return it in New
                          00016
 0010 CB 04           [2] 00017          addb   #4        ; Point to free memory
 0012 D7 01           [4] 00018          stab   Free+1
 0014 D6 00           [3] 00019          ldab   Free
 0016 D7 04           [4] 00020          stab   New
 0018 C9 00           [2] 00021          adcb   #0
 001A D7 00           [4] 00022          stab   Free
                          00023
 001C 39              [5] 00024          rts
                          00025
                          00026 *
                          00027 * Prepend a node to the list
                          00028 *
 001D                     00029 Prepend
 001D 8D ED (000C)    [8] 00030          bsr    Alloc     ; Allocate a new node
                          00031
 001F DE 04           [4] 00032          ldx    New       ; Point to the new node
                          00033
 0021 D6 02           [3] 00034          ldab   Root      ; Point next of new node to Root
 0023 E7 00           [6] 00035          stab   ,X
 0025 D6 03           [3] 00036          ldab   Root+1
 0027 E7 01           [6] 00037          stab   1,X
                          00038
 0029 A7 03           [6] 00039          staa   3,X       ; Store value in node
 002B D6 0A           [3] 00040          ldab   I
 002D E7 02           [6] 00041          stab   2,X
                          00042
 002F DF 02           [5] 00043          stx    Root      ; This is now the first node
                          00044
 0031 39              [5] 00045          rts
                          00046
                          00047 *
                          00048 * Traverse the list and sum the values
                          00049 *
 0032                     00050 Sum
 0032 DE 02           [4] 00051          ldx    Root
 0034 4F              [2] 00052          clra
 0035 5F              [2] 00053          clrb
                          00054
 0036                     00055 SumLoop
 0036 EB 03           [5] 00056          addb   3,X       ; Add value of node to sum
 0038 A9 02           [5] 00057          adca   2,X
 003A 24 08 (0044)    [4] 00058          bcc    NoCarry
                          00059
 003C 7C 0007         [6] 00060          inc    Total+1
                          00061
 003F 26 03 (0044)    [4] 00062          bne    NoCarry
                          00063
 0041 7C 0006         [6] 00064          inc    Total
                          00065
 0044                     00066 NoCarry
 0044 EE 00           [6] 00067          ldx    ,X        ; Point to the next node
 0046 26 EE (0036)    [4] 00068          bne    SumLoop   ; Until end of list
                          00069
 0048 97 08           [4] 00070          staa   Total+2
 004A D7 09           [4] 00071          stab   Total+3
                          00072
 004C 39              [5] 00073          rts
                          00074
                          00075 *
                          00076 *
                          00077 *
 004D                     00078 Main
 004D 96 0B           [3] 00079          ldaa   I+1       ; Get low byte of node value
                          00080
 004F 20 01 (0052)    [4] 00081          bra    IntoLoop
                          00082
 0051                     00083 Loop
 0051 4A              [2] 00084          deca
                          00085
 0052                     00086 IntoLoop
 0052 8D C9 (001D)    [8] 00087          bsr    Prepend
                          00088
 0054 4D              [2] 00089          tsta             ; Low byte of count zero?
 0055 26 FA (0051)    [4] 00090          bne    Loop      ; No, no borrow
                          00091
 0057 7A 000A         [6] 00092          dec    I         ; Borrow from upper byte
 005A 2A F5 (0051)    [4] 00093          bpl    Loop      ; Done if it goes negative
                          00094
 005C 8D D4 (0032)    [8] 00095          bsr    Sum       ; Traverse the list and add them up
                          00096
 005E 3F             [12] 00097          swi
                          00098
 005F                     00099 Heap
                          00100
 004D                     00101          end    Main

The 6502 had trouble keeping up with the others. Am I missing something?

                          00001 ; 519567 ($7ED8F) cycles
                          00002 ; 447558 ($6D446) after inlining subroutines
                          00003
 0000 026F                00004 Free     fdb    Heap
 0002 0000                00005 Root     fdb    0
 0004 0000                00006 New      fdb    0
 0006 0000 0000           00007 Total    fdb    0,0
 000A 0BB7                00008 I        fdb    2999
                          00009
 0200                     00010          org    $200
                          00011
                          00012 ;
                          00013 ; Allocate a node
                          00014 ;
 0200                     00015 Alloc:
 0200 A5 00           [3] 00016          lda    Free      ; Allocate a node
 0202 85 04           [3] 00017          sta    New       ; Return it in New
                          00018
 0204 18              [2] 00019          clc              ; Point to free memory
 0205 69 04           [2] 00020          adc    #4
 0207 85 00           [3] 00021          sta    Free
 0209 A5 01           [3] 00022          lda    Free+1
 020B 85 05           [3] 00023          sta    New+1
 020D 69 00           [2] 00024          adc    #0
 020F 85 01           [3] 00025          sta    Free+1
                          00026
 0211 60              [6] 00027          rts
                          00028
                          00029 ;
                          00030 ; Prepend a node to the list
                          00031 ;
 0212                     00032 Prepend:
 0212 20 0200         [6] 00033          jsr    Alloc     ; Allocate a new node
                          00034
 0215 A0 00           [2] 00035          ldy    #0        ; Point to the new node
                          00036
 0217 A5 02           [3] 00037          lda    Root      ; Point next of new node to Root
 0219 91 04           [6] 00038          sta    (New),Y
 021B C8              [2] 00039          iny
 021C A5 03           [3] 00040          lda    Root+1
 021E 91 04           [6] 00041          sta    (New),Y
 0220 C8              [2] 00042          iny
                          00043
 0221 A5 0A           [3] 00044          lda    I         ; Store value in node
 0223 91 04           [6] 00045          sta    (New),Y
 0225 C8              [2] 00046          iny
 0226 A5 0B           [3] 00047          lda    I+1
 0228 91 04           [6] 00048          sta    (New),Y
                          00049
 022A A5 04           [3] 00050          lda    New       ; This is now the first node
 022C 85 02           [3] 00051          sta    Root
 022E A5 05           [3] 00052          lda    New+1
 0230 85 03           [3] 00053          sta    Root+1
                          00054
 0232 60              [6] 00055          rts
                          00056
                          00057 ;
                          00058 ; Traverse the list and sum the values
                          00059 ;
 0233                     00060 NoCarry:
 0233 A0 00           [2] 00061          ldy    #0        ; Point to the next node
 0235 B1 02         [5/6] 00062          lda    (Root),Y
 0237 AA              [2] 00063          tax
 0238 C8              [2] 00064          iny
 0239 B1 02         [5/6] 00065          lda    (Root),Y
 023B 85 03           [3] 00066          sta    Root+1
 023D 86 02           [3] 00067          stx    Root
                          00068
 023F 05 02           [3] 00069          ora    Root
 0241 F0 1A (025D)  [2/3] 00070          beq    Summed    ; Until end of list
                          00071
 0243                     00072 Sum:
 0243 18              [2] 00073          clc              ; Add value of node to sum
 0244 A0 02           [2] 00074          ldy    #2
 0246 A5 06           [3] 00075          lda    Total
 0248 71 02         [5/6] 00076          adc    (Root),Y
 024A 85 06           [3] 00077          sta    Total
 024C C8              [2] 00078          iny
 024D A5 07           [3] 00079          lda    Total+1
 024F 71 02         [5/6] 00080          adc    (Root),Y
 0251 85 07           [3] 00081          sta    Total+1
 0253 90 DE (0233)  [2/3] 00082          bcc    NoCarry
                          00083
 0255 E6 08           [5] 00084          inc    Total+2
                          00085
 0257 D0 DA (0233)  [2/3] 00086          bne    NoCarry
                          00087
 0259 E6 09           [5] 00088          inc    Total+3
                          00089
 025B D0 D6 (0233)  [2/3] 00090          bne    NoCarry   ; Always branches
                          00091
 025D                     00092 Summed:
 025D 60              [6] 00093          rts
                          00094
                          00095 ;
                          00096 ;
                          00097 ;
 025E                     00098 NoBorrow:
 025E C6 0A           [5] 00099          dec    I
                          00100
 0260                     00101 Main:
 0260 20 0212         [6] 00102          jsr    Prepend
                          00103
 0263 A5 0A           [3] 00104          lda    I         ; Low byte of count zero?
 0265 D0 F7 (025E)  [2/3] 00105          bne    NoBorrow  ; No, no borrow
                          00106
 0267 C6 0B           [5] 00107          dec    I+1       ; Borrow from upper byte
 0269 10 F3 (025E)  [2/3] 00108          bpl    NoBorrow  ; Done if it goes negative
                          00109
 026B 20 0243         [6] 00110          jsr    Sum       ; Traverse the list and add them up
                          00111
 026E 00              [7] 00112          brk
                          00113
 026F                     00114 Heap:
                          00115
 0260                     00116          end    Main

The 8080 was difficult but benefited from abundant registers. Remember that a memory cycle of the 8080 is three machine cycles; it is one on the others, so 8080 machines usually ran at a two to three times faster clock rate.

                          00001 ; 991228 (0F1FFCh) cycles
                          00002 ; 829201 (0CA711h) cycles after inlining subroutines
                          00003
 0100                     00004         org     100h
                          00005
                          00006 ;
                          00007 ; Allocate a node
                          00008 ;
 0100                     00009 Alloc:
 0100 2A 0163        [16] 00010         lhld    Free            ; Allocate a node
                          00011
 0103 11 0004        [10] 00012         lxi     D,4             ; Point to free memory
 0106 EB              [4] 00013         xchg                    ; Keep New in DE
 0107 19             [10] 00014         dad     D
 0108 22 0163        [16] 00015         shld    Free
                          00016
 010B EB              [4] 00017         xchg                    ; Return New in HL
                          00018
 010C C9             [10] 00019         ret
                          00020
                          00021 ;
                          00022 ; Prepend a node to the list
                          00023 ;
 010D                     00024 Prepend:
 010D CD 0100        [17] 00025         call    Alloc           ; Allocate a new node
                          00026
 0110 3A 0165        [13] 00027         lda     Root            ; Point next of new node to Root
 0113 77              [7] 00028         mov     M,A
 0114 3A 0166        [13] 00029         lda     Root+1
 0117 22 0165        [16] 00030         shld    Root            ; This is the new first node
 011A 23              [5] 00031         inx     H
 011B 77              [7] 00032         mov     M,A             ; Finish pointer to previous Root
 011C 23              [5] 00033         inx     H
                          00034
 011D 71              [7] 00035         mov     M,C             ; Store value in node
 011E 23              [5] 00036         inx     H
 011F 70              [7] 00037         mov     M,B
                          00038
 0120 C9             [10] 00039         ret
                          00040
                          00041 ;
                          00042 ; Traverse the list and sum the values
                          00043 ;
 0121                     00044 Sum:
 0121 2A 0165        [16] 00045         lhld    Root
 0124 11 0000        [10] 00046         lxi     D,0
                          00047
 0127                     00048 SumLoop:
 0127 23              [5] 00049         inx     H               ; Add value of node to sum
 0128 23              [5] 00050         inx     H
 0129 4E              [7] 00051         mov     C,M
 012A 23              [5] 00052         inx     H
 012B 46              [7] 00053         mov     B,M
 012C EB              [4] 00054         xchg
 012D 09             [10] 00055         dad     B
 012E EB              [4] 00056         xchg
 012F D2 013B        [10] 00057         jnc     NoCarry
                          00058
 0132 E5             [11] 00059         push    H               ; Carry to high word
 0133 2A 016B        [16] 00060         lhld    Total+2
 0136 23              [5] 00061         inx     H
 0137 22 016B        [16] 00062         shld    Total+2
 013A E1             [10] 00063         pop     H
                          00064
 013B                     00065 NoCarry:
 013B 2B              [5] 00066         dcx     H               ; Point to the next node
 013C 2B              [5] 00067         dcx     H
 013D 7E              [7] 00068         mov     A,M
 013E 2B              [5] 00069         dcx     H
 013F 6E              [7] 00070         mov     L,M
 0140 67              [5] 00071         mov     H,A
                          00072
 0141 B5              [4] 00073         ora     L               ; Until end of list
 0142 C2 0127        [10] 00074         jnz     SumLoop
                          00075
 0145 EB              [4] 00076         xchg
 0146 22 0169        [16] 00077         shld    Total
                          00078
 0149 C9             [10] 00079         ret
                          00080
                          00081 ;
                          00082 ;
                          00083 ;
 014A                     00084 Main:
 014A 2A 016D         [16] 00085         lhld    I               ; Get count
 014D 4D              [5] 00086         mov     C,L
 014E 44              [5] 00087         mov     B,H
                          00088
 014F C3 0153        [10] 00089         jmp     IntoLoop
                          00090
 0152                     00091 NoBorrow:
 0152 0D              [5] 00092         dcr     C
                          00093
 0153                     00094 IntoLoop:
 0153 CD 010D        [17] 00095         call    Prepend
                          00096
 0156 0D              [5] 00097         dcr     C               ; Low byte of count zero?
 0157 0C              [5] 00098         inr     C
 0158 C2 0152        [10] 00099         jnz     NoBorrow        ; No, no borrow
                          00100
 015B 05              [5] 00101         dcr     B               ; Borrow from upper byte
 015C F2 0152        [10] 00102         jp      NoBorrow        ; Done if it goes negative
                          00103
 015F CD 0121        [17] 00104         call    Sum             ; Traverse the list and add them up
                          00105
 0162 76              [7] 00106         hlt
                          00107
 0163 016F                00108 Free    dw      Heap
 0165 0000                00109 Root    dw      0
 0167 0000                00110 New     dw      0
 0169 0000 0000           00111 Total   dw      0,0
 016D 0BB7                00112 I       dw      2999
                          00113
 016F                     00114 Heap
                          00115
 014A                     00116         end     Main

The 6809 ran away with this challenge. The additional registers and flexible addressing modes gave it a major advantage.

                                  00001 * 363754 ($58CEA) cycles original 6800 source code
                                  00002 * 243378 ($3B6B2) cycles
                                  00003 * 171350 ($29D56) cycles after inlining subroutines
                                  00004
 0000 0044                        00005 Free     fdb    Heap
 0002 0000                        00006 Root     fdb    0
 0004 0000                        00007 New      fdb    0
 0006 0000 0000                   00008 Total    fdb    0,0
 000A 0BB7                        00009 I        fdb    2999
                                  00010
                                  00011 *
                                  00012 * Allocate a node
                                  00013 *
 000C                             00014 Alloc
 000C 30 C4                   [4] 00015          leax   ,U        ; Allocate a node
                                  00016
 000E 33 44                   [5] 00017          leau   4,U       ; Point to free memory
                                  00018
 0010 39                      [5] 00019          rts
                                  00020
                                  00021 *
                                  00022 * Prepend a node to the list
                                  00023 *
 0011                             00024 Prepend
 0011 8D F9 (000C)            [7] 00025          bsr    Alloc     ; Allocate a new node
                                  00026
 0013 DC 02                   [5] 00027          ldd    Root      ; Point next of new node to Root
 0015 ED 84                   [5] 00028          std    ,X
                                  00029
 0017 10AF 02                 [7] 00030          sty    2,X       ; Store value in node
                                  00031
 001A 9F 02                   [5] 00032          stx    Root      ; This is now the first node
                                  00033
 001C 39                      [5] 00034          rts
                                  00035
                                  00036 *
                                  00037 * Traverse the list and sum the values
                                  00038 *
 001D                             00039 Sum
 001D 9E 02                   [5] 00040          ldx    Root
 001F CC 0000                 [3] 00041          ldd    #0        ; Clear total
 0022 1F 03                   [6] 00042          tfr    D,U
                                  00043
 0024                             00044 SumLoop
 0024 E3 02                   [7] 00045          addd   2,X       ; Add value of node to sum
 0026 24 02 (002A)            [3] 00046          bcc    NoCarry
                                  00047
 0028 33 41                   [5] 00048          leau   1,U
                                  00049
 002A                             00050 NoCarry
 002A AE 84                   [5] 00051          ldx    ,X        ; Point to the next node
 002C 26 F6 (0024)            [3] 00052          bne    SumLoop   ; Until end of list
                                  00053
 002E DD 08                   [5] 00054          std    Total+2
 0030 DF 06                   [5] 00055          stu    Total
                                  00056
 0032 39                      [5] 00057          rts
                                  00058
                                  00059 *
                                  00060 *
                                  00061 *
 0033                             00062 Main
 0033 109E 0A                 [6] 00063          ldy    I         ; Get node value
 0036 CE 0044                 [3] 00064          ldu    #Heap     ; Point to free memory
                                  00065
 0039                             00066 Loop
 0039 8D D6 (0011)            [7] 00067          bsr    Prepend
                                  00068
 003B 31 3F                   [5] 00069          leay   -1,Y
                                  00070
 003D 26 FA (0039)            [3] 00071          bne    Loop
                                  00072
 003F 8D D0 (0011)            [7] 00073          bsr    Prepend   ; And once for 0 just to make it fair
                                  00074
 0041 8D DA (001D)            [7] 00075          bsr    Sum       ; Traverse the list and add them up
                                  00076
 0043 3F                     [19] 00077          swi
                                  00078
 0044                             00079 Heap
                                  00080
 0033                             00081          end    Main