[Project Log] Python on the 6502/C64, 8080, 6800, 6809 and AVR

I recently resumed working on my “collection” of Space Voyage games.

I think I may finally have all five current versions (6800, 6502, 8080, AVR and 68K) behaving in the same way. The pseudo-random generator has been temporarily wired to start with the same seed so that they may be compared.

In addition to 9900 coding, I will need information on the best cross-development tools and emulators as well as three interface subroutines: write a character, read a character and check whether a character has been typed. I am currently using my own cross-assembler and emulator.

Following cross-dev tool links are from post 170 in this thread.

Excellent TI 99/4A technical info website by Dr. Thierry Nouspikel, who started his TI 99/4A work as a student. Very rich source of info.
http://www.unige.ch/medecine/nouspikel/ti99/titechpages.htm

Complete TMS9900/TI 99/4A cross development too suite (written in python)
https://endlos99.github.io/xdt99/

I would probably go with Classic99 for the emulator
http://www.harmlesslion.com/software/Classic99

Here’s a list of other emulators
http://99er.net/emul.shtml

Will have to get back with you on the functions to read/check for character. That’s the easy part if you use the existing console ROM API calls.

Writing a character will require some infrastructure, since the screen is tile-mapped. So scrolling the screen and other operations to make it look like a terminal are not natively part of the console API. There are frameworks out there to implement that, and may actually be available by GPL calls. But I never did much with GPL, so would have to do some research there.

A good starting reference on the available API calls is the TI Editor/Assembler manual. It also has detailed description of 9900 assembly language and a breakdown of each instruction.
http://99er.net/download2/index.php?act=view&id=88

1 Like

I have been using this for instruction set reference instead of the Editor/Assembler manual:

The authoritative source is the TMS9900 Data Manual, which is attached. The instruction set description is much condensed over the 99/4A manuals.

The only thing that won’t be correct is the clock cycle count. 4 clock cycles need to be added to all memory accesses, except for accesses to the ROM (0x0000-0x1FFF) and the 256 byte scratchpad RAM (0x8300-0x83FF). This is due to wait states added during the conversion of a single 16-bit memory cycle to two 8-bit memory accesses. Why the conversion is done is part of the 99/4A history.

TMS9900_DataManual_ocr.pdf (1.2 MB)

2 Likes

I have been using Appendix A of this manual for instruction timing information:

Many moons ago, I bought all three flavors of Turbo Pascal version 3, CP/M-80, CP/M-86 and PC-DOS. Not only did the CP/M-80 version need a Z80 to run, but the programs it generated also required a Z80, leaving out the many systems based on the 8080 or 8085.

I took brief a look at trying to modify it to generate 8080 code, but that effort did not get very far as more of my time was being spent learning to program for the IBM PC.

Recent discussions about simulating an 8080 brought up the incompatibility between Turbo Pascal and the 8080.

I believe that modifying the compiler to directly generate 8080 code to be too difficult. Modifying it to run on an 8080 is out of the question. But much of the same result may be achieved by translating the compiled code to the 8080, then stitching it to a run-time library rewritten for the 8080.

I am happy to be able to report that the prototype for converting the Z80 code produced by the CP/M Turbo Pascal compiler to 8080 code is going well. So far, no Z80 instructions have been encountered. Unfortunately, the same cannot be said for the subroutine library; it makes extensive use of the index and alternate registers as well as instructions like subtract with carry of register pairs.

The original Pascal code:

  procedure Greet;

    begin
      writeln('Hello world.')
    end;  { Greet }

  begin
    Greet
  end.

The Z80 code generated by the compiler:

0100 C3E320      jp      20E3
Command? u20e3l23
20E3 310001      ld      SP,0100
20E6 21C2EC      ld      HL,ECC2
20E9 0100FF      ld      BC,FF00
20EC CD6503      call    0365
20EF 212021      ld      HL,2120
20F2 11D0EA      ld      DE,EAD0
20F5 0142ED      ld      BC,ED42
20F8 3E01        ld      A,01
20FA CDD504      call    04D5
20FD C31A21      jp      211A
2100 CD9C14      call    149C
2103 CDBB17      call    17BB
Command? d2106ld
2100                    0C 48-65 6C 6C 6F 20 77 6F 72  '      .Hello wor'
2110  6C 64 2E                                         'ld.             '
Command? u2113
2113 CDCE17      call    17CE
2116 CD1C20      call    201C
2119 C9          ret
211A CD0021      call    2100
211D C3D520      jp      20D5

The 8080 code generated by the recompiler:

        org     100h

        jmp     L_20E3

        org     20E3h

L_20E3:
        lxi     SP,0100h
        lxi     H,0ECC2h
        lxi     B,0FF00h
        call    R_0365
        lxi     H,2120h
        lxi     D,0EAD0h
        lxi     B,0ED42h
        mvi     A,01h
        call    R_04D5
        jmp     L_211A
L_2100:
        call    R_149C
        call    R_17BB
        db      12,'Hello world.'
        call    R_17CE
        call    R_201C
        ret
L_211A:
        call    L_2100
        jmp     R_20D5

        end     100h
2 Likes

It turns out that the line

20EF 211921      ld      HL,2119

actually points HL just past the last instruction in the program, I am guessing to initialize the heap, so the generated code is now this:

        org     100h

        jmp     L_20E3

        org     20E3h

L_20E3:
        lxi     SP,0100h
        lxi     H,0ECC2h
        lxi     B,0FF00h
        call    R_0365
        lxi     H,L_End
        lxi     D,0EAD8h
        lxi     B,0ED42h
        mvi     A,01h
        call    R_04D5
        call    R_149C
        call    R_17BB
        db      12,'Hello world.'
        call    R_17CE
        call    R_201C
        jmp     R_20D5
L_End:

        end     100h
1 Like

Last month, I revisited the 6800 version of the floating point code and rewrote the test driver to catch up with the AVR version.

float

The number display is much better plus the value is retained so that another operation may be performed upon it.

There is still some more work to be done. I still need to investigate implementing rounding, guard and sticky bits, then make a double precision version.

2 Likes

Can 8080 code be smaller and faster than Z80 code?

Much has been said about how the new instructions added to the Z80 are often too slow to be useful, that these are more commonly used to create compact code rather than faster execution.

I have a subroutine for which a version written for the 8080 is both smaller and quicker. It is used by CP/M Turbo Pascal programs to release allocated memory back to the heap. The 8080 version is not only faster; it is substantially faster.

I did not originally set out to write a faster implementation but had to reach repeatedly into my bag of tricks to avoid running out of registers while trying to do a literal translation. The result ended up surprisingly more efficient.

The original Z80 code:

                          07241 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                          07242 ;
                          07243 ; Dispose
                          07244 ;
                          07245 ; Input:
                          07246 ;       HL = size of block to free
                          07247 ;       Top of stack = the address of variable containing block address
                          07248 ;
 1D7B                     07249 L_1D7B:
 1D7B EB              [4] 07250         ex      DE,HL           ; Move size to DE
                          07251
 1D7C E1             [10] 07252         pop     HL              ; Get return address
 1D7D E3             [19] 07253         ex      (SP),HL         ; Swap for address of variable
                          07254
 1D7E 7E              [7] 07255         ld      A,(HL)          ; Get address of the block to free
 1D7F 23              [5] 07256         inc     HL
 1D80 66              [7] 07257         ld      H,(HL)
 1D81 6F              [4] 07258         ld      L,A
                          07259
 1D82 13              [5] 07260         inc     DE              ; Round size up to next multiple of 4
 1D83 13              [5] 07261         inc     DE
 1D84 13              [5] 07262         inc     DE
 1D85 7B              [4] 07263         ld      A,E
 1D86 E6FC            [7] 07264         and     0FCh
 1D88 5F              [4] 07265         ld      E,A
                          07266
 1D89 EB              [4] 07267         ex      DE,HL           ; Save "quantized" size
 1D8A 22 00F0        [16] 07268         ld      (L_00F0),HL     ; DE -> block to free
                          07269
 1D8D 2A 00DE        [16] 07270         ld      HL,(L_00DE)     ; Point to the first free block
 1D90 E5             [11] 07271         push    HL
 1D91 DDE1           [14] 07272         pop     IX
                          07273
 1D93 B7              [4] 07274         or      A               ; Is block to free lower?
 1D94 ED52           [15] 07275         sbc     HL,DE
 1D96 3052 (1DEA)  [7/12] 07276         jr      NC,L_1DEA       ; Yes
                          07277
 1D98                     07278 L_1D98:
 1D98 DD6E00         [19] 07279         ld      L,(IX+0)        ; Get next address
 1D9B DD6601         [19] 07280         ld      H,(IX+1)
 1D9E E5             [11] 07281         push    HL
                          07282
 1D9F B7              [4] 07283         or      A               ; Is this block still lower?
 1DA0 ED52           [15] 07284         sbc     HL,DE
 1DA2 3004 (1DA8)  [7/12] 07285         jr      NC,L_1DA8       ; No
                          07286
 1DA4 DDE1           [14] 07287         pop     IX              ; Point to next free block
                          07288
 1DA6 18F0 (1D98)    [12] 07289         jr      L_1D98          ; Keep looking
                          07290
 1DA8                     07291 L_1DA8:
 1DA8 E1             [10] 07292         pop     HL              ; Get block just above block to free
                          07293
 1DA9 D5             [11] 07294         push    DE              ; Point to block to be freed
 1DAA FDE1           [14] 07295         pop     IY
                          07296
 1DAC ED4B 00F0      [20] 07297         ld      BC,(L_00F0)     ; Retrieve block size
                          07298
 1DB0 FD7102         [19] 07299         ld      (IY+2),C        ; Store size
 1DB3 FD7003         [19] 07300         ld      (IY+3),B
 1DB6 FD7500         [19] 07301         ld      (IY+0),L        ; Store next
 1DB9 FD7401         [19] 07302         ld      (IY+1),H
                          07303
 1DBC DD7300         [19] 07304         ld      (IX+0),E        ; Point here from previous block
 1DBF DD7201         [19] 07305         ld      (IX+1),D
                          07306
 1DC2 DDE5           [15] 07307         push    IX              ; Point to previous block
 1DC4 E1             [10] 07308         pop     HL
 1DC5 DD4E02         [19] 07309         ld      C,(IX+2)        ; Get its size
 1DC8 DD4603         [19] 07310         ld      B,(IX+3)
 1DCB CD 1E05        [17] 07311         call    L_1E05          ; Try to coalesce
 1DCE 2809 (1DD9)  [7/12] 07312         jr      Z,L_1DD9        ; Combined?
                          07313
 1DD0 DD5E00         [19] 07314         ld      E,(IX+0)        ; Point to next block
 1DD3 DD5601         [19] 07315         ld      D,(IX+1)
 1DD6 D5             [11] 07316         push    DE
 1DD7 DDE1           [14] 07317         pop     IX
                          07318
 1DD9                     07319 L_1DD9:
 1DD9 DDE5           [15] 07320         push    IX              ; Set up to try combining with next block
 1DDB E1             [10] 07321         pop     HL
 1DDC DD4E02         [19] 07322         ld      C,(IX+2)        ; Load size
 1DDF DD4603         [19] 07323         ld      B,(IX+3)
 1DE2 DD5E00         [19] 07324         ld      E,(IX+0)        ; Load next
 1DE5 DD5601         [19] 07325         ld      D,(IX+1)
 1DE8 181B (1E05)    [12] 07326         jr      L_1E05
                          07327
 1DEA                     07328 L_1DEA:
 1DEA 2A 00DE        [16] 07329         ld      HL,(L_00DE)     ; Freed block is now first free block
 1DED ED53 00DE      [20] 07330         ld      (L_00DE),DE
 1DF1 D5             [11] 07331         push    DE
 1DF2 DDE1           [14] 07332         pop     IX
 1DF4 DD7500         [19] 07333         ld      (IX+0),L        ; Store next
 1DF7 DD7401         [19] 07334         ld      (IX+1),H
 1DFA ED4B 00F0      [20] 07335         ld      BC,(L_00F0)     ; Store size
 1DFE DD7102         [19] 07336         ld      (IX+2),C
 1E01 DD7003         [19] 07337         ld      (IX+3),B
 1E04 EB              [4] 07338         ex      DE,HL
                          07339
 1E05                     07340 L_1E05:
 1E05 09             [11] 07341         add     HL,BC           ; Are the blocks adjacent?
 1E06 B7              [4] 07342         or      A
 1E07 ED52           [15] 07343         sbc     HL,DE
 1E09 C0           [5/11] 07344         ret     NZ              ; No
                          07345
 1E0A D5             [11] 07346         push    DE
 1E0B FDE1           [14] 07347         pop     IY
                          07348
 1E0D 2A 00C4        [16] 07349         ld      HL,(L_00C4)     ; Is upper block at the top of the heap?
 1E10 B7              [4] 07350         or      A
 1E11 ED52           [15] 07351         sbc     HL,DE
 1E13 281B (1E30)  [7/12] 07352         jr      Z,L_1E30        ; Yes
                          07353
 1E15 FD7E00         [19] 07354         ld      A,(IY+0)        ; Transfer next to lower block
 1E18 DD7700         [19] 07355         ld      (IX+0),A
 1E1B FD7E01         [19] 07356         ld      A,(IY+1)
 1E1E DD7701         [19] 07357         ld      (IX+1),A
 1E21 FD6E02         [19] 07358         ld      L,(IY+2)        ; Add the sizes
 1E24 FD6603         [19] 07359         ld      H,(IY+3)
 1E27 09             [11] 07360         add     HL,BC
 1E28 DD7502         [19] 07361         ld      (IX+2),L
 1E2B DD7403         [19] 07362         ld      (IX+3),H
 1E2E AF              [4] 07363         xor     A               ; Clear Z flag
                          07364
 1E2F C9             [10] 07365         ret
                          07366
 1E30                     07367 L_1E30:
 1E30 DDE5           [15] 07368         push    IX              ; Set new top of heap
 1E32 E1             [10] 07369         pop     HL
 1E33 22 00C4        [16] 07370         ld      (L_00C4),HL
                          07371
 1E36 0604            [7] 07372         ld      B,4             ; Zero the header
                          07373
 1E38                     07374 L_1E38:
 1E38 3600           [10] 07375         ld      (HL),0
 1E3A 23              [5] 07376         inc     HL
 1E3B 10FB (1E38)  [8/13] 07377         djnz    L_1E38
                          07378
 1E3D C9             [10] 07379         ret

As rewritten for the 8080:

                          05810 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                          05811 ;
                          05812 ; Dispose
                          05813 ;
                          05814 ; Input:
                          05815 ;       HL = size of block to free
                          05816 ;       Top of stack = the address of variable containing block address
                          05817 ;
 14E9                     05818 Rq1D7B:
 14E9 EB              [4] 05819         xchg                    ; Move size to DE
                          05820
 14EA E1             [10] 05821         pop     H               ; Get return address
 14EB E3             [18] 05822         xthl                    ; Swap for address of variable
                          05823
 14EC 7E              [7] 05824         mov     A,M             ; Get address of the block to free
 14ED 23              [5] 05825         inx     H
 14EE 66              [7] 05826         mov     H,M
 14EF 6F              [5] 05827         mov     L,A
                          05828
 14F0 13              [5] 05829         inx     D               ; Round size up to next multiple of 4
 14F1 13              [5] 05830         inx     D
 14F2 13              [5] 05831         inx     D
 14F3 7B              [5] 05832         mov     A,E
 14F4 E6 FC           [7] 05833         ani     0FCh
 14F6 5F              [5] 05834         mov     E,A
                          05835
 14F7 EB              [4] 05836         xchg                    ; Save "quantized" size
 14F8 22 00F0        [16] 05837         shld    Vq00F0          ; DE -> block to free
                          05838
 14FB 2A 00DE        [16] 05839         lhld    Vq00DE          ; Point to the first free block
                          05840
 14FE 7D              [5] 05841         mov     A,L             ; Is block to free lower?
 14FF 93              [4] 05842         sub     E
 1500 7C              [5] 05843         mov     A,H
 1501 9A              [4] 05844         sbb     D
 1502 DA 151E        [10] 05845         jc      Lq1D98          ; No
                          05846
 1505 2A 00F0        [16] 05847         lhld    Vq00F0          ; Load size into BC
 1508 44              [5] 05848         mov     B,H
 1509 4D              [5] 05849         mov     C,L
                          05850
 150A 2A 00DE        [16] 05851         lhld    Vq00DE          ; Freed block is now first free block
 150D EB              [4] 05852         xchg
 150E 22 00DE        [16] 05853         shld    Vq00DE
                          05854
 1511 73              [7] 05855         mov     M,E             ; Store next
 1512 23              [5] 05856         inx     H
 1513 72              [7] 05857         mov     M,D
                          05858
 1514 23              [5] 05859         inx     H               ; Store size
 1515 71              [7] 05860         mov     M,C
 1516 23              [5] 05861         inx     H
 1517 70              [7] 05862         mov     M,B
 1518 2B              [5] 05863         dcx     H
 1519 2B              [5] 05864         dcx     H
 151A 2B              [5] 05865         dcx     H
                          05866
 151B C3 1554        [10] 05867         jmp     Lq1E05          ; Try to coalesce with next block
                          05868
 151E                     05869 Lq1D98:
 151E 44              [5] 05870         mov     B,H             ; Remember current block address
 151F 4D              [5] 05871         mov     C,L
                          05872
 1520 7E              [7] 05873         mov     A,M             ; Get next address
 1521 23              [5] 05874         inx     H
 1522 66              [7] 05875         mov     H,M
 1523 6F              [5] 05876         mov     L,A
                          05877
 1524 93              [4] 05878         sub     E               ; Is this block still lower?
 1525 7C              [5] 05879         mov     A,H
 1526 9A              [4] 05880         sbb     D
 1527 DA 151E        [10] 05881         jc      Lq1D98          ; Yes
                          05882
 152A E5             [11] 05883         push    H               ; Save address of block above block to free
                          05884
 152B 60              [5] 05885         mov     H,B             ; Point to block just below block to free
 152C 69              [5] 05886         mov     L,C
                          05887
 152D 73              [7] 05888         mov     M,E             ; Point here from previous block
 152E 23              [5] 05889         inx     H
 152F 72              [7] 05890         mov     M,D
                          05891
 1530 E1             [10] 05892         pop     H               ; Refresh address of block above block to free
                          05893
 1531 EB              [4] 05894         xchg                    ; Point to block to free
                          05895
 1532 73              [7] 05896         mov     M,E             ; Store next
 1533 23              [5] 05897         inx     H
 1534 72              [7] 05898         mov     M,D
                          05899
 1535 EB              [4] 05900         xchg                    ; Retrieve block size
 1536 2A 00F0        [16] 05901         lhld    Vq00F0
 1539 EB              [4] 05902         xchg
                          05903
 153A 23              [5] 05904         inx     H               ; Store size
 153B 73              [7] 05905         mov     M,E
 153C 23              [5] 05906         inx     H
 153D 72              [7] 05907         mov     M,D
                          05908
 153E 60              [5] 05909         mov     H,B             ; Point to previous block
 153F 69              [5] 05910         mov     L,C
                          05911
 1540 CD 154A        [17] 05912         call    Lq1DD9          ; Try to coalesce with block being freed
 1543 CA 154A        [10] 05913         jz      Lq1DD9          ; Combined?
                          05914
 1546 7E              [7] 05915         mov     A,M             ; Point to next block
 1547 23              [5] 05916         inx     H
 1548 66              [7] 05917         mov     H,M
 1549 6F              [5] 05918         mov     L,A
                          05919
 154A                     05920 Lq1DD9:
 154A 23              [5] 05921         inx     H               ; Set up to try combining with the next block
 154B 23              [5] 05922         inx     H
 154C 23              [5] 05923         inx     H
 154D 46              [7] 05924         mov     B,M             ; Load size
 154E 2B              [5] 05925         dcx     H
 154F 4E              [7] 05926         mov     C,M
 1550 2B              [5] 05927         dcx     H               ; Load next
 1551 56              [7] 05928         mov     D,M
 1552 2B              [5] 05929         dcx     H
 1553 5E              [7] 05930         mov     E,M
                          05931
 1554                     05932 Lq1E05:
 1554 7D              [5] 05933         mov     A,L             ; Are the blocks adjacent?
 1555 81              [4] 05934         add     C
 1556 BB              [4] 05935         cmp     E
 1557 C0           [5/11] 05936         rnz                     ; No - low bytes do not match
 1558 7D              [5] 05937         mov     A,L
 1559 81              [4] 05938         add     C
 155A 7C              [5] 05939         mov     A,H
 155B 88              [4] 05940         adc     B
 155C BA              [4] 05941         cmp     D
 155D C0           [5/11] 05942         rnz                     ; No - high bytes do not match
                          05943
 155E E5             [11] 05944         push    H               ; Stash address of lower block
                          05945
 155F 2A 00C4        [16] 05946         lhld    Vq00C4          ; Is upper block at the top of the heap?
 1562 7D              [5] 05947         mov     A,L
 1563 BB              [4] 05948         cmp     E
 1564 C2 156C        [10] 05949         jnz     Lq1E15          ; No
 1567 7C              [5] 05950         mov     A,H
 1568 BA              [4] 05951         cmp     D
 1569 CA 1582        [10] 05952         jz      Lq1E30          ; Yes
                          05953
 156C                     05954 Lq1E15:
 156C E1             [10] 05955         pop     H               ; Recover address of lower block
                          05956
 156D 1A              [7] 05957         ldax    D               ; Transfer next to lower block
 156E 77              [7] 05958         mov     M,A
 156F 23              [5] 05959         inx     H
 1570 13              [5] 05960         inx     D
 1571 1A              [7] 05961         ldax    D
 1572 77              [7] 05962         mov     M,A
 1573 23              [5] 05963         inx     H
 1574 13              [5] 05964         inx     D
                          05965
 1575 1A              [7] 05966         ldax    D               ; Add the sizes
 1576 81              [4] 05967         add     C
 1577 77              [7] 05968         mov     M,A
 1578 13              [5] 05969         inx     D
 1579 23              [5] 05970         inx     H
 157A 1A              [7] 05971         ldax    D
 157B 88              [4] 05972         adc     B
 157C 77              [7] 05973         mov     M,A
                          05974
 157D 2B              [5] 05975         dcx     H               ; Point back to the base
 157E 2B              [5] 05976         dcx     H
 157F 2B              [5] 05977         dcx     H
                          05978
 1580 AF              [4] 05979         xra     A               ; Clear Z flag
                          05980
 1581 C9             [10] 05981         ret
                          05982
 1582                     05983 Lq1E30:
 1582 E1             [10] 05984         pop     H               ; Recover address of lower block
                          05985
 1583 22 00C4        [16] 05986         shld    Vq00C4          ; Set new top of heap
                          05987
 1586 06 04           [7] 05988         mvi     B,4             ; Zero the header
                          05989
 1588                     05990 Lq1E38:
 1588 36 00          [10] 05991         mvi     M,0
 158A 23              [5] 05992         inx     H
 158B 05              [5] 05993         dcr     B
 158C C2 1588        [10] 05994         jnz     Lq1E38
                          05995
 158F C9             [10] 05996         ret
1 Like

I have been putting some development time into FLEX again. The first priority is rearranging the several components of the system to arrive at a set of configurations which work on the largest number of dissimilar 6502 systems.

The second is changing the API slightly to try to reduce the reloading of registers. The initial prototype expected addresses to be passed in registers A:X because that was how I kept a sixteen-bit number while doing calculations. By changing the calling convention to Y:X, I am saving up to about half of the register reload instructions.

Now I have a couple of questions:

How useful is it to not corrupt memory used by a program if the operating system is rebooted? Some 680x programs provided a warm restart entry point at $103; this may be useful to try to save data after a crash before a total clean restart. Is it important at all to preserve the contents of the Utility Command Space under the same scenario?

I have always thought that the “current drive” concept in CP/M and DOS was a much more convenient way to specify the “working” drive than using the ASN (assign) utility in FLEX.

Now that I have a platform for experimentation, I have prototyped a current drive mode for FLEX. Instead of the usual “+++” command prompt, it uses “0++” when drive 0 is the current drive and operating in current drive mode. The “1.” command makes drive 1 the current drive.

The ASN utility will be modified to switch between the classic and current drive modes in addition to assigning drives in classic mode.

So is this a good idea or am I off in the weeds?

2 Likes

I just discovered a bug lurking in my processor simulators for many years.

The rotate and shift instructions of some processors like the 6800 and 6502 set the zero and sign flags based on the value of the result. My code for most of them had been propagating the values of the native x86 processor flags to those of the simulated CPU. The problem is that the shift and rotate instructions of the x86 does not affect the zero and sign flags - a small detail I had overlooked.

That this is not an often used feature is why it went this long.

Correction: The rotate instructions of the x86 do not affect the sign and zero flags. The shift instructions do. That explains how I managed to miss this.

2 Likes

I got big binary number to ASCII decimal conversion working with the Double Dabble algorithm.

Instead of over 22 billion machine cycles with the divide by ten for each digit method, the Sevens problem completes in a little over 103.2 million cycles. That is not unlike going from a bicycle to an airliner.

There is still some room for optimization. Namely the fact that numbers up to 100 bytes in size do not result in a string over 255 characters long, so calculations can be done using single-byte values. I’m thinking maybe up to a doubling of the speed.

2 Likes
number = 1
power=0
string=""

while string.find("777777") < 0:

        number = number * 7
        power = power + 1
        string = str(number)

print('Seven to the power of', power, 'is', string)

The original divide by 10 for each digit method took over 22 billion cycles.

The first double-byte double dabble implementation took $626D28A 103207562 cycles.

The implementation of a single-byte double dabble mode for numbers 100 bytes or less took $59D709A 94204058 cycles. This was more than I was expecting.

After some more optimization, it dropped to $59D47D6 94193622 cycles.

What this is trying to say is that the decimal conversion is no longer a large majority of the time. To see how this broke down:

Omitting the string search took $5974C0C 93801484 cycles.

number = 1
power=0
string=""

while power < 176:

	number = number * 7
	power = power + 1
	string = str(number)

Omitting the string search and the decimal conversion took $3F55E03 66412035 cycles.

number = 1
power=0
string=""

while power < 176:

	number = number * 7
	power = power + 1

What this means is that the multiplication code needs another look. I know it can be improved.

3 Likes

Some time ago, I posted implementations of binary to decimal conversion using the Double Dabble algorithm for several microprocessors:

Rudla posted code to convert a binary number to packed BCD:

http://forum.6502.org/viewtopic.php?f=2&t=6579

I recognized that his code bore a definite resemblance to Double Dabble, but it was clearly not the same. Studying his implementation allowed me to step back and see the forest from the trees of the details of the Double Dabble algorithm. Improvements from a straight interpretation of Double Dabble include:

  • all of that testing of and adding to individual BCD digits is merely a way to perform a “decimal adjustment” when the hardware does not directly support it. Hardware supporting decimal adjustment can combine the shifting and fiddling by adding to itself (doubling) with the decimal arithmetic mode built into the 6502 or a DAA decimal adjust instruction following the addition on other processors.
  • shifting only one byte of the number being converted at a time instead of shifting the entire number. The savings become much more significant for larger numbers.
  • growing the result as digits are added instead of needlessly shifting an ocean of zeroes in a large pre-cleared buffer.

I improveed on Rudla’s implementation by starting with no bytes of result instead of one byte containing zero, saving up to seven shifts of the byte. I also did a minor tweak improving the innermost loop.

Using what I learned, I rewrote my conversion code from scratch, making it faster and simpler in the process.

My design is based on maximum reuse of memory and is intended to produce a printable character string instead of just a packed BCD number. Because the maximum number to be converted is eight bytes, the buffer is placed in the zero-page for maximum efficiency. The number to be converted is copied to the low end of the output buffer with the least-significant byte at the lowest address. The converted BCD digits grow downward from the end of the buffer and do not overlap the number until after its bytes are already consumed by the shifting. Because the largest number to be converted is 64 bits, it was reasonable to place the buffer in the zero page.

While I will not claim that these are the fastest routines to convert 32 and 64 bit integers to ASCII decimal strings, I am confident they are among the best.

Did you know that there is a 32000 character limit for a post? I know now…

2 Likes

The 6502 version:

.                         00191 ; Input:
.                         00192 ;       The absolute value of the number to convert is stored
.                         00193 ;         starting at OutBuf+2 in little-endian form
.                         00194 ;       A = number of bytes in the number
.                         00195 ;       Dabble is the label of last byte of the buffer
.                         00196 ;       OutBuf+0 = 0 if number is negative, other if negative
.                         00197 ;
.                         00198 ; Output:
.                         00199 ;       A:X contains the address of the resultant string
.                         00200 ;
.                         00201 ; Uses:
.                         00202 ;       Byt0 = number of bytes to convert
.                         00203 ;       Byt1 = number of packed BCD bytes in result
.                         00204 ;       Byt2 = bits of number being shifted
.206E                     00225 Format:
.206E 85 12           [3] 00226          sta    Byt0      ; Remember number of bytes to convert
.2070 AA              [2] 00227          tax              ; Use it as index into the number
.                         00228
.2071 A9 00           [2] 00229          lda    #0        ; Start with no bytes of BCD digits
.2073 85 13           [3] 00230          sta    Byt1
.                         00231
.2075 F8              [2] 00232          sed              ; Switch to decimal mode
.                         00233
.2076                     00234 SkipLeadingZero:
.2076 B5 19           [4] 00235          lda    OutBuf+1,X  ; Get next highest byte of the number
.2078 D0 07 (2081)  [2/3] 00236          bne    NotLeadingZero
.                         00237
.207A CA              [2] 00238          dex              ; Skip a leading zero
.                         00239
.207B D0 F9 (2076)  [2/3] 00240          bne    SkipLeadingZero ; Check the next one if more
.                         00241
.207D F0 2F (20AE)  [2/3] 00242          beq    BCD2ASCII ; All bytes are zero
.                         00243
.207F                     00244 ByteLoop:
.207F B5 19           [4] 00245          lda    OutBuf+1,X  ; Get next highest byte of the number
.                         00246
.2081                     00247 NotLeadingZero:
.2081 85 14           [3] 00248          sta    Byt2      ; Set it for shifting
.                         00249
.2083 86 12           [3] 00250          stx    Byt0      ; Remember the address of the last one shifted
.                         00251
.2085 38              [2] 00252          sec              ; Shift in extra bit for eight trips through loop
.                         00253
.2086 B0 12 (209A)  [2/3] 00254          bcs    EnterBitLoop ; Always branches
.                         00255
.2088                     00256 BitLoop:
.2088 A2 2D           [2] 00257          ldx    #Dabble   ; Point to the BCD digits
.                         00258
.208A A4 13           [3] 00259          ldy    Byt1      ; Load count of number of digit bytes
.208C F0 0A (2098)  [2/3] 00260          beq    SkipDigitLoop ; Skip loop if none
.                         00261
.208E                     00262 DigitLoop:
.208E B5 00           [4] 00263          lda    0,X       ; Double pair of BCD digits with carry
.2090 75 00           [4] 00264          adc    0,X
.2092 95 00           [4] 00265          sta    0,X
.                         00266
.2094 CA              [2] 00267          dex              ; Point to next pair of BCD digits
.                         00268
.2095                     00269 IntoDigitLoop:
.2095 88              [2] 00270          dey              ; More BCD digits?
.2096 D0 F6 (208E)  [2/3] 00271          bne    DigitLoop ; Yes
.                         00272
.2098                     00273 SkipDigitLoop:
.2098 B0 0B (20A5)  [2/3] 00274          bcs    Extend    ; Carry out of accumulated digits?
.                         00275
.209A                     00276 EnterBitLoop:
.209A 26 14           [5] 00277          rol    Byt2      ; Transfer upper bit to carry flag
.209C D0 EA (2088)  [2/3] 00278          bne    BitLoop   ; Repeat if more
.                         00279
.209E A6 12           [3] 00280          ldx    Byt0      ; Point back to the number to convert
.                         00281
.20A0 CA              [2] 00282          dex              ; Any more bytes of the number to convert?
.20A1 D0 DC (207F)  [2/3] 00283          bne    ByteLoop  ; Repeat if more
.                         00284
.20A3 F0 09 (20AE)  [2/3] 00285          beq    BCD2ASCII ; Always branches to unpacking stage
.                         00286
.20A5                     00287 Extend:
.20A5 E6 13           [5] 00288          inc    Byt1      ; Add a new byte to the result
.20A7 A9 01           [2] 00289          lda    #1
.20A9 95 00           [4] 00290          sta    0,X
.                         00291
.20AB 18              [2] 00292          clc
.20AC 90 EC (209A)  [2/3] 00293          bcc    EnterBitLoop ; Always branches
.                         00294
.20AE                     00295 BCD2ASCII:
.20AE D8              [2] 00296          cld              ; Return to binary mode
.                         00297
.20AF A5 13           [3] 00298          lda    Byt1      ; Any converted digits?
.20B1 F0 42 (20F5)  [2/3] 00299          beq    NoDigits  ; No
.                         00300
.20B3 38              [2] 00301          sec
.20B4 A9 2E           [2] 00302          lda    #Dabble+1  ; Point to most significant BCD pair
.20B6 E5 13           [3] 00303          sbc    Byt1
.20B8 A8              [2] 00304          tay
.                         00305
.20B9 A2 1A           [2] 00306          ldx    #<OutBuf+2  ; Point to result buffer
.                         00307
.20BB B9 0000       [4/5] 00308          lda    0,Y       ; Get most significant BCD pair
.                         00309
.20BE 29 F0           [2] 00310          and    #$F0      ; Is the upper one zero?
.20C0 F0 0E (20D0)  [2/3] 00311          beq    UpperZero
.20C2 D0 03 (20C7)  [2/3] 00312          bne    ConvertBoth
.                         00313
.20C4                     00314 ASCIILoop:
.20C4 B9 0000       [4/5] 00315          lda    0,Y       ; Get next most significant BCD pair
.                         00316
.20C7                     00317 ConvertBoth:
.20C7 4A              [2] 00318          lsr    A         ; Convert upper one to ASCII
.20C8 4A              [2] 00319          lsr    A
.20C9 4A              [2] 00320          lsr    A
.20CA 4A              [2] 00321          lsr    A
.20CB 09 30           [2] 00322          ora    #'0'
.20CD 95 00           [4] 00323          sta    0,X
.20CF E8              [2] 00324          inx
.                         00325
.20D0                     00326 UpperZero:
.20D0 B9 0000       [4/5] 00327          lda    0,Y
.20D3 C8              [2] 00328          iny
.20D4 29 0F           [2] 00329          and    #$F       ; And then the lower one
.20D6 09 30           [2] 00330          ora    #'0'
.20D8 95 00           [4] 00331          sta    0,X
.20DA E8              [2] 00332          inx
.                         00333
.20DB C6 13           [5] 00334          dec    Byt1      ; More digits?
.20DD D0 E5 (20C4)  [2/3] 00335          bne    ASCIILoop ; Yes
.                         00336
.20DF 8A              [2] 00337          txa              ; Store string length
.20E0 38              [2] 00338          sec
.20E1 E9 1A           [2] 00339          sbc    #OutBuf+2
.                         00340
.20E3 A6 18           [3] 00341          ldx    OutBuf    ; Is the number negative?
.20E5 F0 14 (20FB)  [2/3] 00342          beq    NumberNotNegative
.                         00343
.20E7 18              [2] 00344          clc              ; Increment and store length
.20E8 69 01           [2] 00345          adc    #1
.20EA 85 18           [3] 00346          sta    OutBuf
.                         00347
.20EC A9 2D           [2] 00348          lda    #'-'      ; Prepend minus sign
.20EE 85 19           [3] 00349          sta    OutBuf+1
.                         00350
.20F0 A9 00           [2] 00351          lda    #>OutBuf
.20F2 A2 18           [2] 00352          ldx    #<OutBuf
.                         00353
.20F4 60              [6] 00354          rts
.                         00355
.20F5                     00356 NoDigits:
.20F5 A9 30           [2] 00357          lda    #'0'      ; Emit single '0'
.20F7 85 1A           [3] 00358          sta    OutBuf+2
.                         00359
.20F9 A9 01           [2] 00360          lda    #1        ; String length one
.                         00361
.20FB                     00362 NumberNotNegative:
.20FB 85 19           [3] 00363          sta    OutBuf+1
.                         00364
.20FD A9 00           [2] 00365          lda    #>OutBuf+1
.20FF A2 19           [2] 00366          ldx    #<OutBuf+1
.                         00367
.2101 60              [6] 00368          rts

The scorecard:

                          00001 ; Converting $FFFFFFFF
                          00002 ; old $2BC9 11209 cycles for call FormatLongword
                          00003 ; new $0A72 2674 cycles
                          00004 ;
                          00005 ; Converting $00000001
                          00006 ; old $2A96 10902 cycles for call FormatLongword
                          00007 ; new $015B 347 cycles
                          00008 ;
                          00009 ; Converting $00000000
                          00010 ; old $2A98 10904 cycles for call FormatLongword
                          00011 ; new $0075 117 cycles

The addressing modes and the three byte-sized registers worked well for this implementation.

The 6800 version:

.                         00486 * Input:
.                         00487 *       The absolute value of the number to convert is stored
.                         00488 *         starting at OutBuf+2 in little-endian form
.                         00489 *       A = number of bytes in the number
.                         00490 *       Dabble is the label of last byte of the buffer
.                         00491 *       OutBuf+0 = 0 if number is negative, other if negative
.                         00492 *
.                         00493 * Output:
.                         00494 *       X contains the address of the resultant string
.                         00495 *
.                         00496 * Uses:
.                         00497 *       Byt0 = number of bytes to convert
.                         00498 *       Byt1 = number of packed BCD bytes in result
.                         00499 *       Byt2 = bits of number being shifted
.                         00500 *       Ptr0
.                         00501 *       Ptr1
.01F5                     00522 Format
.01F5 97 10           [4] 00523          staa   Byt0      ; Remember number of bytes to convert
.                         00524
.01F7 8B 17           [2] 00525          adda   #<OutBuf+1 ; Point to most significant byte of number
.01F9 97 0B           [4] 00526          staa   Ptr0+1
.01FB 86 00           [2] 00527          ldaa   #>OutBuf+1
.01FD 89 00           [2] 00528          adca   #0
.01FF 97 0A           [4] 00529          staa   Ptr0
.0201 DE 0A           [4] 00530          ldx    Ptr0
.                         00531
.0203 7F 0011         [6] 00532          clr    Byt1      ; Start with no bytes of BCD digits
.                         00533
.0206                     00534 SkipLeadingZero
.0206 E6 00           [5] 00535          ldab   ,X        ; Get next highest byte of the number
.0208 26 0D (0217)    [4] 00536          bne    NotLeadingZero
.                         00537
.020A 09              [4] 00538          dex              ; Skip a leading zero
.                         00539
.020B 7A 0010         [6] 00540          dec    Byt0
.020E 26 F6 (0206)    [4] 00541          bne    SkipLeadingZero ; Check the next one if more
.                         00542
.0210 20 32 (0244)    [4] 00543          bra    BCD2ASCII ; All bytes are zero
.                         00544
.0212                     00545 ByteLoop
.0212 DE 0A           [4] 00546          ldx    Ptr0      ; Point back to the number to convert
.0214 09              [4] 00547          dex              ; Point to next highest byte
.                         00548
.0215 E6 00           [5] 00549          ldab   ,X        ; Get next highest byte of the number
.                         00550
.0217                     00551 NotLeadingZero
.0217 D7 12           [4] 00552          stab   Byt2      ; Set it for shifting
.                         00553
.0219 DF 0A           [5] 00554          stx    Ptr0      ; Remember the address of the last one shifted
.                         00555
.021B 0D              [2] 00556          sec              ; Shift in extra bit for eight trips through loop
.                         00557
.021C 20 1C (023A)    [4] 00558          bra    EnterBitLoop
.                         00559
.021E                     00560 BitLoop
.021E CE 002B         [3] 00561          ldx    #Dabble   ; Point to the BCD digits
.                         00562
.0221 D6 11           [3] 00563          ldab   Byt1      ; Load count of number of digit bytes
.0223 27 0B (0230)    [4] 00564          beq    SkipDigitLoop ; Skip loop if none
.                         00565
.0225                     00566 DigitLoop
.0225 A6 00           [5] 00567          ldaa   ,X        ; Double pair of BCD digits with carry
.0227 A9 00           [5] 00568          adca   ,X
.0229 19              [2] 00569          daa
.022A A7 00           [6] 00570          staa   ,X
.                         00571
.022C 09              [4] 00572          dex              ; Point to next pair of BCD digits
.                         00573
.022D                     00574 IntoDigitLoop
.022D 5A              [2] 00575          decb             ; More BCD digits?
.022E 26 F5 (0225)    [4] 00576          bne    DigitLoop ; Yes
.                         00577
.0230                     00578 SkipDigitLoop
.0230 24 08 (023A)    [4] 00579          bcc    EnterBitLoop ; Carry out of accumulated digits?
.                         00580
.0232 7C 0011         [6] 00581          inc    Byt1      ; Add a new byte to the result
.0235 86 01           [2] 00582          ldaa   #1
.0237 A7 00           [6] 00583          staa   ,X
.                         00584
.0239 0C              [2] 00585          clc
.                         00586
.023A                     00587 EnterBitLoop
.023A 79 0012         [6] 00588          rol    Byt2      ; Transfer upper bit to carry flag
.023D 26 DF (021E)    [4] 00589          bne    BitLoop   ; Repeat if more
.                         00590
.023F 7A 0010         [6] 00591          dec    Byt0      ; Any more bytes of the number to convert?
.0242 26 CE (0212)    [4] 00592          bne    ByteLoop  ; Repeat if more
.                         00593
.0244                     00594 BCD2ASCII
.0244 96 11           [3] 00595          ldaa   Byt1      ; Any converted digits?
.0246 27 57 (029F)    [4] 00596          beq    NoDigits  ; No
.                         00597
.0248 86 2C           [2] 00598          ldaa   #<Dabble+1 ; Point to most significant BCD pair
.024A 90 11           [3] 00599          suba   Byt1
.024C 97 0B           [4] 00600          staa   Ptr0+1
.024E 86 00           [2] 00601          ldaa   #>Dabble+1
.0250 82 00           [2] 00602          sbca   #0
.0252 97 0A           [4] 00603          staa   Ptr0
.                         00604
.0254 CE 0018         [3] 00605          ldx    #OutBuf+2 ; Point to result buffer
.0257 DF 0C           [5] 00606          stx    Ptr1
.                         00607
.0259 DE 0A           [4] 00608          ldx    Ptr0      ; Get most significant BCD pair
.025B A6 00           [5] 00609          ldaa   ,X
.025D 08              [4] 00610          inx
.025E DF 0A           [5] 00611          stx    Ptr0
.0260 DE 0C           [4] 00612          ldx    Ptr1
.                         00613
.0262 16              [2] 00614          tab              ; Clone it
.                         00615
.0263 85 F0           [2] 00616          bita   #$F0      ; Is the upper one zero?
.0265 27 17 (027E)    [4] 00617          beq    ConvertLowerOnly
.0267 20 0C (0275)    [4] 00618          bra    ConvertBoth
.                         00619
.0269                     00620 ASCIILoop
.0269 DF 0C           [5] 00621          stx    Ptr1
.026B DE 0A           [4] 00622          ldx    Ptr0      ; Get next most significant BCD pair
.026D A6 00           [5] 00623          ldaa   ,X
.026F 08              [4] 00624          inx
.0270 DF 0A           [5] 00625          stx    Ptr0
.0272 DE 0C           [4] 00626          ldx    Ptr1
.                         00627
.0274 16              [2] 00628          tab              ; Clone it
.                         00629
.0275                     00630 ConvertBoth
.0275 44              [2] 00631          lsra             ; Convert upper one to ASCII
.0276 44              [2] 00632          lsra
.0277 44              [2] 00633          lsra
.0278 44              [2] 00634          lsra
.0279 8A 30           [2] 00635          oraa   #'0'
.027B A7 00           [6] 00636          staa   ,X
.027D 08              [4] 00637          inx
.                         00638
.027E                     00639 ConvertLowerOnly
.027E C4 0F           [2] 00640          andb   #$F       ; And then the lower one
.0280 CA 30           [2] 00641          orab   #'0'
.0282 E7 00           [6] 00642          stab   ,X
.0284 08              [4] 00643          inx
.                         00644
.0285 7A 0011         [6] 00645          dec    Byt1      ; More BCD pairs?
.0288 26 DF (0269)    [4] 00646          bne    ASCIILoop
.                         00647
.028A DF 0C           [5] 00648          stx    Ptr1      ; Determine string length
.028C 96 0D           [3] 00649          ldaa   Ptr1+1
.028E 80 18           [2] 00650          suba   #<OutBuf+2
.                         00651
.0290 D6 16           [3] 00652          ldab   OutBuf    ; Is the number negative?
.0292 27 11 (02A5)    [4] 00653          beq    NumberNotNegative
.                         00654
.0294 4C              [2] 00655          inca             ; Increment and store length byte
.0295 97 16           [4] 00656          staa   OutBuf
.                         00657
.0297 C6 2D           [2] 00658          ldab   #'-'      ; Prepend minus sign
.0299 D7 17           [4] 00659          stab   OutBuf+1
.                         00660
.029B CE 0016         [3] 00661          ldx    #OutBuf   ; Point to the result
.                         00662
.029E 39              [5] 00663          rts
.                         00664
.029F                     00665 NoDigits
.029F 86 30           [2] 00666          ldaa   #'0'      ; Emit single '0'
.02A1 97 18           [4] 00667          staa   OutBuf+2
.                         00668
.02A3 86 01           [2] 00669          ldaa   #1        ; String length one
.                         00670
.02A5                     00671 NumberNotNegative
.02A5 97 17           [4] 00672          staa   OutBuf+1
.                         00673
.02A7 CE 0017         [3] 00674          ldx    #OutBuf+1
.                         00675
.02AA 39              [5] 00676          rts

The scorecard:

                          00001 * Converting $FFFFFFFF
                          00002 * old $38F0 14576 cycles for call FormatLongword
                          00003 * new $0FED 4077 cycles
                          00004 *
                          00005 * Converting $00000001
                          00006 * old $37B3 14259 cycles for call FormatLongword
                          00007 * new $01F2 498 cycles
                          00008 *
                          00009 * Converting $00000000
                          00010 * old $37B0 14256 cycles for call FormatLongword
                          00011 * new $00B9 185 cycles

The 6800 did not fare so well with only one index register and relatively slow memory access instructions.

The added capabilities of the 6809 do not appear to be much better.

The 8080 version:

.                         00127 ; Input:
.                         00128 ;       The absolute value of the number to convert is stored
.                         00129 ;         starting at OutBuf+2 in little-endian form
.                         00130 ;       A = number of bytes in the number
.                         00131 ;       B = 0 if number is positive, other if negative
.                         00132 ;       Dabble is the label of last byte of the buffer
.                         00133 ;
.                         00134 ; Output:
.                         00135 ;       HL contains the address of the resultant string
.0173                     00156 Format:
.0173 4F              [5] 00157         mov     C,A             ; Remember number of bytes to convert
.0174 5F              [5] 00158         mov     E,A
.                         00159
.0175 78              [5] 00160         mov     A,B             ; Remember negativity
.0176 32 0333        [13] 00161         sta     OutBuf
.                         00162
.0179 AF              [4] 00163         xra     A               ; Start with no bytes of BCD digits
.017A 47              [5] 00164         mov     B,A
.                         00165
.017B 21 0334        [10] 00166         lxi     H,OutBuf+1      ; Point DE to most significant byte of number
.017E 57              [5] 00167         mov     D,A
.017F 19             [10] 00168         dad     D
.0180 EB              [4] 00169         xchg
.                         00170
.0181                     00171 SkipLeadingZero:
.0181 1A              [7] 00172         ldax    D               ; Get next highest byte of the number
.0182 B7              [4] 00173         ora     A
.0183 C2 018F        [10] 00174         jnz     NotLeadingZero
.                         00175
.0186 1B              [5] 00176         dcx     D               ; Skip a leading zero
.                         00177
.0187 0D              [5] 00178         dcr     C
.0188 C2 0181        [10] 00179         jnz     SkipLeadingZero ; Check the next one if more
.                         00180
.018B C3 01BC        [10] 00181         jmp     BIN2ASCII       ; All bytes are zero
.                         00182
.018E                     00183 ByteLoop:
.018E 1A              [7] 00184         ldax    D               ; Get next highest byte of the number
.                         00185
.018F                     00186 NotLeadingZero:
.018F 1B              [5] 00187         dcx     D               ; Point to next lower byte
.                         00188
.0190 D5             [11] 00189         push    D               ; Stash pointer to next byte
.                         00190
.0191 5F              [5] 00191         mov     E,A             ; Set it for shifting
.                         00192
.0192 37              [4] 00193         stc                     ; Shift in extra bit for eight times thru loop
.                         00194
.0193 C3 01AF        [10] 00195         jmp     EnterBitLoop
.                         00196
.0196                     00197 BitLoop:
.0196 21 0348        [10] 00198         lxi     H,Dabble        ; Point to the BCD digits
.                         00199
.0199 04              [5] 00200         inr     B               ; Check count of number of digit bytes
.019A 05              [5] 00201         dcr     B               ; Test for zero but preserve carry flag
.019B CA 01A8        [10] 00202         jz      SkipDigitLoop   ; Skip loop if none
.                         00203
.019E 50              [5] 00204         mov     D,B
.                         00205
.019F                     00206 DigitLoop:
.019F 7E              [7] 00207         mov     A,M             ; Double pair of BCD digits with carry
.01A0 8E              [7] 00208         adc     M
.01A1 27              [4] 00209         daa
.01A2 77              [7] 00210         mov     M,A
.                         00211
.01A3 2B              [5] 00212         dcx     H               ; Point to next pair of BCD digits
.                         00213
.01A4                     00214 IntoDigitLoop:
.01A4 15              [5] 00215         dcr     D               ; More BCD digits?
.01A5 C2 019F        [10] 00216         jnz     DigitLoop       ; Yes
.                         00217
.01A8                     00218 SkipDigitLoop:
.01A8 D2 01AF        [10] 00219         jnc     EnterBitLoop    ; Carry out of accumulated digits?
.                         00220
.01AB 04              [5] 00221         inr     B               ; Add new byte to the result
.01AC 36 01          [10] 00222         mvi     M,1
.                         00223
.01AE B7              [4] 00224         ora     A               ; Clear carry flag
.                         00225
.01AF                     00226 EnterBitLoop:
.01AF 7B              [5] 00227         mov     A,E
.01B0 17              [4] 00228         ral                     ; Transfer upper bit to carry flag
.01B1 5F              [5] 00229         mov     E,A
.                         00230
.01B2 1C              [5] 00231         inr     E
.01B3 1D              [5] 00232         dcr     E
.01B4 C2 0196        [10] 00233         jnz     BitLoop         ; Repeat if more
.                         00234
.01B7 D1             [10] 00235         pop     D               ; Recover pointer to next byte
.                         00236
.01B8 0D              [5] 00237         dcr     C               ; Any more bytes of the number to convert?
.01B9 C2 018E        [10] 00238         jnz     ByteLoop        ; Repeat if more
.                         00239
.01BC                     00240 BIN2ASCII:
.01BC 21 0335        [10] 00241         lxi     H,OutBuf+2      ; Point to result buffer
.                         00242
.01BF 78              [5] 00243         mov     A,B             ; Any converted digits?
.01C0 B7              [4] 00244         ora     A
.01C1 CA 01F4        [10] 00245         jz      NoDigits        ; No
.                         00246
.01C4 11 0349        [10] 00247         lxi     D,Dabble+1      ; Point to most significant BCD pair
.01C7 7B              [5] 00248         mov     A,E
.01C8 90              [4] 00249         sub     B
.01C9 5F              [5] 00250         mov     E,A
.01CA 7A              [5] 00251         mov     A,D
.01CB DE 00           [7] 00252         sbi     0
.01CD 57              [5] 00253         mov     D,A
.                         00254
.01CE 1A              [7] 00255         ldax    D               ; Get most significant BCD pair
.                         00256
.01CF E6 F0           [7] 00257         ani     0F0h            ; Is the upper one zero?
.01D1 CA 01DF        [10] 00258         jz      ConvertLowerOnly
.                         00259
.01D4                     00260 ASCIILoop:
.01D4 1A              [7] 00261         ldax    D               ; Get next most significant BCD pair
.01D5 0F              [4] 00262         rrc                     ; Convert upper one to ASCII
.01D6 0F              [4] 00263         rrc
.01D7 0F              [4] 00264         rrc
.01D8 0F              [4] 00265         rrc
.01D9 E6 0F           [7] 00266         ani     0Fh
.01DB F6 30           [7] 00267         ori     '0'
.01DD 77              [7] 00268         mov     M,A
.01DE 23              [5] 00269         inx     H
.                         00270
.01DF                     00271 ConvertLowerOnly:
.01DF 1A              [7] 00272         ldax    D               ; And then the lower one
.01E0 13              [5] 00273         inx     D
.01E1 E6 0F           [7] 00274         ani     0Fh
.01E3 F6 30           [7] 00275         ori     '0'
.01E5 77              [7] 00276         mov     M,A
.01E6 23              [5] 00277         inx     H
.                         00278
.01E7 05              [5] 00279         dcr     B
.01E8 C2 01D4        [10] 00280         jnz     ASCIILoop
.                         00281
.01EB 7D              [5] 00282         mov     A,L             ; Store string length
.01EC DE 35           [7] 00283         sbi     OutBuf+2 and 0FFh
.01EE 32 0334        [13] 00284         sta     OutBuf+1
.                         00285
.01F1 C3 01F9        [10] 00286         jmp     FinishFormat
.                         00287
.01F4                     00288 NoDigits:
.01F4 36 30          [10] 00289         mvi     M,'0'           ; Emit single '0'
.                         00290
.01F6 2B              [5] 00291         dcx     H               ; String length one
.01F7 36 01          [10] 00292         mvi     M,1
.                         00293
.01F9                     00294 FinishFormat:
.01F9 21 0334        [10] 00295         lxi     H,OutBuf+1      ; Point to the result
.                         00296
.01FC 3A 0333        [13] 00297         lda     OutBuf          ; Was number negative?
.01FF B7              [4] 00298         ora     A
.0200 CA 0209        [10] 00299         jz      NumberNotNegative       ; No
.                         00300
.0203 7E              [7] 00301         mov     A,M             ; Get and increment length byte
.0204 3C              [5] 00302         inr     A
.                         00303
.0205 36 2D          [10] 00304         mvi     M,'-'           ; Prepend minus sign
.                         00305
.0207 2B              [5] 00306         dcx     H               ; Point to the new result
.                         00307
.0208 77              [7] 00308         mov     M,A             ; Store new length
.                         00309
.0209                     00310 NumberNotNegative:
.0209 C9             [10] 00311         ret

The scorecard:

                          00001 ; Converting $FFFFFFFF
                          00002 ; old $7F53 32595 cycles for call FormatLongword
                          00003 ; new $2359 9049 cycles
                          00004 ; new $1F18 7960 cycles
                          00005 ;
                          00006 ; Converting $00000001
                          00007 ; old $7CEE 31982 cycles for call FormatLongword
                          00008 ; new $055A 1370 cycles
                          00009 ; new $04C5 1221 cycles
                          00010 ;
                          00011 ; Converting $00000000
                          00012 ; old $7D05 32005 cycles for call FormatLongword
                          00013 ; new $01E0 480 cycles
                          00014 ; new $01A2 418 cycles

Two sets of new numbers are reported. One for the initial version and another for after some intense code golf managed to eliminate variables in memory.

I actually wrote the 8080 version first, then moved on to the 6800 and then the 6502. Only then did I invest the effort to optimize this one.

3 Likes

I came across some benchmarks to compare Millfork with other compilers:

The linked list one caught my attention. It creates a list of 3000 nodes then traverses it to add up the values.

My implementions are somewhat like the one provided in C except that I do not use his timing mechanism and the list creation loop counts down instead of up. My intent is not to compare with the compilers but between several different processors.

I am slowly relenting to the prospect that a loop ending with an unconditional branch back to the top often performs better by transforming it to enter in the middle. The result looks unstructured, but it is not very different.

The first one is for the 6800. I have found that it is easiest to adopt 6800 code to other processors instead of the other way around.

The dual accumulators and 16-bit index register with displacement is especially well suited for this challenge.

Though inlining the subroutines provided a substantial improvement, I am showing the code before that final transformation for clarity.

                          00001 * 378834 ($5C7D2) cycles
                          00002 * 300821 ($49715) cycles after inlining subroutines
                          00003
 0000 005F                00004 Free     fdb    Heap
 0002 0000                00005 Root     fdb    0
 0004 0000                00006 New      fdb    0
 0006 0000 0000           00007 Total    fdb    0,0
 000A 0BB7                00008 I        fdb    2999
                          00009
                          00010 *
                          00011 * Allocate a node
                          00012 *
 000C                     00013 Alloc
 000C D6 01           [3] 00014          ldab   Free+1    ; Allocate a node
 000E D7 05           [4] 00015          stab   New+1     ; Return it in New
                          00016
 0010 CB 04           [2] 00017          addb   #4        ; Point to free memory
 0012 D7 01           [4] 00018          stab   Free+1
 0014 D6 00           [3] 00019          ldab   Free
 0016 D7 04           [4] 00020          stab   New
 0018 C9 00           [2] 00021          adcb   #0
 001A D7 00           [4] 00022          stab   Free
                          00023
 001C 39              [5] 00024          rts
                          00025
                          00026 *
                          00027 * Prepend a node to the list
                          00028 *
 001D                     00029 Prepend
 001D 8D ED (000C)    [8] 00030          bsr    Alloc     ; Allocate a new node
                          00031
 001F DE 04           [4] 00032          ldx    New       ; Point to the new node
                          00033
 0021 D6 02           [3] 00034          ldab   Root      ; Point next of new node to Root
 0023 E7 00           [6] 00035          stab   ,X
 0025 D6 03           [3] 00036          ldab   Root+1
 0027 E7 01           [6] 00037          stab   1,X
                          00038
 0029 A7 03           [6] 00039          staa   3,X       ; Store value in node
 002B D6 0A           [3] 00040          ldab   I
 002D E7 02           [6] 00041          stab   2,X
                          00042
 002F DF 02           [5] 00043          stx    Root      ; This is now the first node
                          00044
 0031 39              [5] 00045          rts
                          00046
                          00047 *
                          00048 * Traverse the list and sum the values
                          00049 *
 0032                     00050 Sum
 0032 DE 02           [4] 00051          ldx    Root
 0034 4F              [2] 00052          clra
 0035 5F              [2] 00053          clrb
                          00054
 0036                     00055 SumLoop
 0036 EB 03           [5] 00056          addb   3,X       ; Add value of node to sum
 0038 A9 02           [5] 00057          adca   2,X
 003A 24 08 (0044)    [4] 00058          bcc    NoCarry
                          00059
 003C 7C 0007         [6] 00060          inc    Total+1
                          00061
 003F 26 03 (0044)    [4] 00062          bne    NoCarry
                          00063
 0041 7C 0006         [6] 00064          inc    Total
                          00065
 0044                     00066 NoCarry
 0044 EE 00           [6] 00067          ldx    ,X        ; Point to the next node
 0046 26 EE (0036)    [4] 00068          bne    SumLoop   ; Until end of list
                          00069
 0048 97 08           [4] 00070          staa   Total+2
 004A D7 09           [4] 00071          stab   Total+3
                          00072
 004C 39              [5] 00073          rts
                          00074
                          00075 *
                          00076 *
                          00077 *
 004D                     00078 Main
 004D 96 0B           [3] 00079          ldaa   I+1       ; Get low byte of node value
                          00080
 004F 20 01 (0052)    [4] 00081          bra    IntoLoop
                          00082
 0051                     00083 Loop
 0051 4A              [2] 00084          deca
                          00085
 0052                     00086 IntoLoop
 0052 8D C9 (001D)    [8] 00087          bsr    Prepend
                          00088
 0054 4D              [2] 00089          tsta             ; Low byte of count zero?
 0055 26 FA (0051)    [4] 00090          bne    Loop      ; No, no borrow
                          00091
 0057 7A 000A         [6] 00092          dec    I         ; Borrow from upper byte
 005A 2A F5 (0051)    [4] 00093          bpl    Loop      ; Done if it goes negative
                          00094
 005C 8D D4 (0032)    [8] 00095          bsr    Sum       ; Traverse the list and add them up
                          00096
 005E 3F             [12] 00097          swi
                          00098
 005F                     00099 Heap
                          00100
 004D                     00101          end    Main

The 6502 had trouble keeping up with the others. Am I missing something?

                          00001 ; 519567 ($7ED8F) cycles
                          00002 ; 447558 ($6D446) after inlining subroutines
                          00003
 0000 026F                00004 Free     fdb    Heap
 0002 0000                00005 Root     fdb    0
 0004 0000                00006 New      fdb    0
 0006 0000 0000           00007 Total    fdb    0,0
 000A 0BB7                00008 I        fdb    2999
                          00009
 0200                     00010          org    $200
                          00011
                          00012 ;
                          00013 ; Allocate a node
                          00014 ;
 0200                     00015 Alloc:
 0200 A5 00           [3] 00016          lda    Free      ; Allocate a node
 0202 85 04           [3] 00017          sta    New       ; Return it in New
                          00018
 0204 18              [2] 00019          clc              ; Point to free memory
 0205 69 04           [2] 00020          adc    #4
 0207 85 00           [3] 00021          sta    Free
 0209 A5 01           [3] 00022          lda    Free+1
 020B 85 05           [3] 00023          sta    New+1
 020D 69 00           [2] 00024          adc    #0
 020F 85 01           [3] 00025          sta    Free+1
                          00026
 0211 60              [6] 00027          rts
                          00028
                          00029 ;
                          00030 ; Prepend a node to the list
                          00031 ;
 0212                     00032 Prepend:
 0212 20 0200         [6] 00033          jsr    Alloc     ; Allocate a new node
                          00034
 0215 A0 00           [2] 00035          ldy    #0        ; Point to the new node
                          00036
 0217 A5 02           [3] 00037          lda    Root      ; Point next of new node to Root
 0219 91 04           [6] 00038          sta    (New),Y
 021B C8              [2] 00039          iny
 021C A5 03           [3] 00040          lda    Root+1
 021E 91 04           [6] 00041          sta    (New),Y
 0220 C8              [2] 00042          iny
                          00043
 0221 A5 0A           [3] 00044          lda    I         ; Store value in node
 0223 91 04           [6] 00045          sta    (New),Y
 0225 C8              [2] 00046          iny
 0226 A5 0B           [3] 00047          lda    I+1
 0228 91 04           [6] 00048          sta    (New),Y
                          00049
 022A A5 04           [3] 00050          lda    New       ; This is now the first node
 022C 85 02           [3] 00051          sta    Root
 022E A5 05           [3] 00052          lda    New+1
 0230 85 03           [3] 00053          sta    Root+1
                          00054
 0232 60              [6] 00055          rts
                          00056
                          00057 ;
                          00058 ; Traverse the list and sum the values
                          00059 ;
 0233                     00060 NoCarry:
 0233 A0 00           [2] 00061          ldy    #0        ; Point to the next node
 0235 B1 02         [5/6] 00062          lda    (Root),Y
 0237 AA              [2] 00063          tax
 0238 C8              [2] 00064          iny
 0239 B1 02         [5/6] 00065          lda    (Root),Y
 023B 85 03           [3] 00066          sta    Root+1
 023D 86 02           [3] 00067          stx    Root
                          00068
 023F 05 02           [3] 00069          ora    Root
 0241 F0 1A (025D)  [2/3] 00070          beq    Summed    ; Until end of list
                          00071
 0243                     00072 Sum:
 0243 18              [2] 00073          clc              ; Add value of node to sum
 0244 A0 02           [2] 00074          ldy    #2
 0246 A5 06           [3] 00075          lda    Total
 0248 71 02         [5/6] 00076          adc    (Root),Y
 024A 85 06           [3] 00077          sta    Total
 024C C8              [2] 00078          iny
 024D A5 07           [3] 00079          lda    Total+1
 024F 71 02         [5/6] 00080          adc    (Root),Y
 0251 85 07           [3] 00081          sta    Total+1
 0253 90 DE (0233)  [2/3] 00082          bcc    NoCarry
                          00083
 0255 E6 08           [5] 00084          inc    Total+2
                          00085
 0257 D0 DA (0233)  [2/3] 00086          bne    NoCarry
                          00087
 0259 E6 09           [5] 00088          inc    Total+3
                          00089
 025B D0 D6 (0233)  [2/3] 00090          bne    NoCarry   ; Always branches
                          00091
 025D                     00092 Summed:
 025D 60              [6] 00093          rts
                          00094
                          00095 ;
                          00096 ;
                          00097 ;
 025E                     00098 NoBorrow:
 025E C6 0A           [5] 00099          dec    I
                          00100
 0260                     00101 Main:
 0260 20 0212         [6] 00102          jsr    Prepend
                          00103
 0263 A5 0A           [3] 00104          lda    I         ; Low byte of count zero?
 0265 D0 F7 (025E)  [2/3] 00105          bne    NoBorrow  ; No, no borrow
                          00106
 0267 C6 0B           [5] 00107          dec    I+1       ; Borrow from upper byte
 0269 10 F3 (025E)  [2/3] 00108          bpl    NoBorrow  ; Done if it goes negative
                          00109
 026B 20 0243         [6] 00110          jsr    Sum       ; Traverse the list and add them up
                          00111
 026E 00              [7] 00112          brk
                          00113
 026F                     00114 Heap:
                          00115
 0260                     00116          end    Main

The 8080 was difficult but benefited from abundant registers. Remember that a memory cycle of the 8080 is three machine cycles; it is one on the others, so 8080 machines usually ran at a two to three times faster clock rate.

                          00001 ; 991228 (0F1FFCh) cycles
                          00002 ; 829201 (0CA711h) cycles after inlining subroutines
                          00003
 0100                     00004         org     100h
                          00005
                          00006 ;
                          00007 ; Allocate a node
                          00008 ;
 0100                     00009 Alloc:
 0100 2A 0163        [16] 00010         lhld    Free            ; Allocate a node
                          00011
 0103 11 0004        [10] 00012         lxi     D,4             ; Point to free memory
 0106 EB              [4] 00013         xchg                    ; Keep New in DE
 0107 19             [10] 00014         dad     D
 0108 22 0163        [16] 00015         shld    Free
                          00016
 010B EB              [4] 00017         xchg                    ; Return New in HL
                          00018
 010C C9             [10] 00019         ret
                          00020
                          00021 ;
                          00022 ; Prepend a node to the list
                          00023 ;
 010D                     00024 Prepend:
 010D CD 0100        [17] 00025         call    Alloc           ; Allocate a new node
                          00026
 0110 3A 0165        [13] 00027         lda     Root            ; Point next of new node to Root
 0113 77              [7] 00028         mov     M,A
 0114 3A 0166        [13] 00029         lda     Root+1
 0117 22 0165        [16] 00030         shld    Root            ; This is the new first node
 011A 23              [5] 00031         inx     H
 011B 77              [7] 00032         mov     M,A             ; Finish pointer to previous Root
 011C 23              [5] 00033         inx     H
                          00034
 011D 71              [7] 00035         mov     M,C             ; Store value in node
 011E 23              [5] 00036         inx     H
 011F 70              [7] 00037         mov     M,B
                          00038
 0120 C9             [10] 00039         ret
                          00040
                          00041 ;
                          00042 ; Traverse the list and sum the values
                          00043 ;
 0121                     00044 Sum:
 0121 2A 0165        [16] 00045         lhld    Root
 0124 11 0000        [10] 00046         lxi     D,0
                          00047
 0127                     00048 SumLoop:
 0127 23              [5] 00049         inx     H               ; Add value of node to sum
 0128 23              [5] 00050         inx     H
 0129 4E              [7] 00051         mov     C,M
 012A 23              [5] 00052         inx     H
 012B 46              [7] 00053         mov     B,M
 012C EB              [4] 00054         xchg
 012D 09             [10] 00055         dad     B
 012E EB              [4] 00056         xchg
 012F D2 013B        [10] 00057         jnc     NoCarry
                          00058
 0132 E5             [11] 00059         push    H               ; Carry to high word
 0133 2A 016B        [16] 00060         lhld    Total+2
 0136 23              [5] 00061         inx     H
 0137 22 016B        [16] 00062         shld    Total+2
 013A E1             [10] 00063         pop     H
                          00064
 013B                     00065 NoCarry:
 013B 2B              [5] 00066         dcx     H               ; Point to the next node
 013C 2B              [5] 00067         dcx     H
 013D 7E              [7] 00068         mov     A,M
 013E 2B              [5] 00069         dcx     H
 013F 6E              [7] 00070         mov     L,M
 0140 67              [5] 00071         mov     H,A
                          00072
 0141 B5              [4] 00073         ora     L               ; Until end of list
 0142 C2 0127        [10] 00074         jnz     SumLoop
                          00075
 0145 EB              [4] 00076         xchg
 0146 22 0169        [16] 00077         shld    Total
                          00078
 0149 C9             [10] 00079         ret
                          00080
                          00081 ;
                          00082 ;
                          00083 ;
 014A                     00084 Main:
 014A 2A 016D         [16] 00085         lhld    I               ; Get count
 014D 4D              [5] 00086         mov     C,L
 014E 44              [5] 00087         mov     B,H
                          00088
 014F C3 0153        [10] 00089         jmp     IntoLoop
                          00090
 0152                     00091 NoBorrow:
 0152 0D              [5] 00092         dcr     C
                          00093
 0153                     00094 IntoLoop:
 0153 CD 010D        [17] 00095         call    Prepend
                          00096
 0156 0D              [5] 00097         dcr     C               ; Low byte of count zero?
 0157 0C              [5] 00098         inr     C
 0158 C2 0152        [10] 00099         jnz     NoBorrow        ; No, no borrow
                          00100
 015B 05              [5] 00101         dcr     B               ; Borrow from upper byte
 015C F2 0152        [10] 00102         jp      NoBorrow        ; Done if it goes negative
                          00103
 015F CD 0121        [17] 00104         call    Sum             ; Traverse the list and add them up
                          00105
 0162 76              [7] 00106         hlt
                          00107
 0163 016F                00108 Free    dw      Heap
 0165 0000                00109 Root    dw      0
 0167 0000                00110 New     dw      0
 0169 0000 0000           00111 Total   dw      0,0
 016D 0BB7                00112 I       dw      2999
                          00113
 016F                     00114 Heap
                          00115
 014A                     00116         end     Main

The 6809 ran away with this challenge. The additional registers and flexible addressing modes gave it a major advantage.

                                  00001 * 363754 ($58CEA) cycles original 6800 source code
                                  00002 * 243378 ($3B6B2) cycles
                                  00003 * 171350 ($29D56) cycles after inlining subroutines
                                  00004
 0000 0044                        00005 Free     fdb    Heap
 0002 0000                        00006 Root     fdb    0
 0004 0000                        00007 New      fdb    0
 0006 0000 0000                   00008 Total    fdb    0,0
 000A 0BB7                        00009 I        fdb    2999
                                  00010
                                  00011 *
                                  00012 * Allocate a node
                                  00013 *
 000C                             00014 Alloc
 000C 30 C4                   [4] 00015          leax   ,U        ; Allocate a node
                                  00016
 000E 33 44                   [5] 00017          leau   4,U       ; Point to free memory
                                  00018
 0010 39                      [5] 00019          rts
                                  00020
                                  00021 *
                                  00022 * Prepend a node to the list
                                  00023 *
 0011                             00024 Prepend
 0011 8D F9 (000C)            [7] 00025          bsr    Alloc     ; Allocate a new node
                                  00026
 0013 DC 02                   [5] 00027          ldd    Root      ; Point next of new node to Root
 0015 ED 84                   [5] 00028          std    ,X
                                  00029
 0017 10AF 02                 [7] 00030          sty    2,X       ; Store value in node
                                  00031
 001A 9F 02                   [5] 00032          stx    Root      ; This is now the first node
                                  00033
 001C 39                      [5] 00034          rts
                                  00035
                                  00036 *
                                  00037 * Traverse the list and sum the values
                                  00038 *
 001D                             00039 Sum
 001D 9E 02                   [5] 00040          ldx    Root
 001F CC 0000                 [3] 00041          ldd    #0        ; Clear total
 0022 1F 03                   [6] 00042          tfr    D,U
                                  00043
 0024                             00044 SumLoop
 0024 E3 02                   [7] 00045          addd   2,X       ; Add value of node to sum
 0026 24 02 (002A)            [3] 00046          bcc    NoCarry
                                  00047
 0028 33 41                   [5] 00048          leau   1,U
                                  00049
 002A                             00050 NoCarry
 002A AE 84                   [5] 00051          ldx    ,X        ; Point to the next node
 002C 26 F6 (0024)            [3] 00052          bne    SumLoop   ; Until end of list
                                  00053
 002E DD 08                   [5] 00054          std    Total+2
 0030 DF 06                   [5] 00055          stu    Total
                                  00056
 0032 39                      [5] 00057          rts
                                  00058
                                  00059 *
                                  00060 *
                                  00061 *
 0033                             00062 Main
 0033 109E 0A                 [6] 00063          ldy    I         ; Get node value
 0036 CE 0044                 [3] 00064          ldu    #Heap     ; Point to free memory
                                  00065
 0039                             00066 Loop
 0039 8D D6 (0011)            [7] 00067          bsr    Prepend
                                  00068
 003B 31 3F                   [5] 00069          leay   -1,Y
                                  00070
 003D 26 FA (0039)            [3] 00071          bne    Loop
                                  00072
 003F 8D D0 (0011)            [7] 00073          bsr    Prepend   ; And once for 0 just to make it fair
                                  00074
 0041 8D DA (001D)            [7] 00075          bsr    Sum       ; Traverse the list and add them up
                                  00076
 0043 3F                     [19] 00077          swi
                                  00078
 0044                             00079 Heap
                                  00080
 0033                             00081          end    Main
1 Like

Thinking about the relative strengths of the 6502 always returns focus to the zero page.

Within the zero page, the 6502 indexing modes do behave like the 6800 index with displacement mode, with dramatic results. Taking liberties that the list need not be preserved after traversal, it is handled in parts which do fit in the zero page.

                          00004 ; 328611 ($503A3) cycles
                          00005 ; 256134 ($3E886) cycles after inlining subroutines
                          00006
 0000 08                  00007 Free     fcb    Heap
 0001 4B                  00008 J        fcb    75        ; Inner loop runs 75 times
 0002 0000 0000           00009 Total    fdb    0,0
 0006 0BB7                00010 I        fdb    2999
                          00011
 0008                     00012 Heap:
                          00013
 0200                     00014          org    $200
                          00015
                          00016 ;
                          00017 ; Allocate a node
                          00018 ;
 0200                     00019 Alloc:
 0200 A5 00           [3] 00020          lda    Free      ; Allocate a node
 0202 AA              [2] 00021          tax              ; Return it in X
                          00022
 0203 18              [2] 00023          clc              ; Point to free memory
 0204 69 03           [2] 00024          adc    #3
 0206 85 00           [3] 00025          sta    Free
                          00026
 0208 60              [6] 00027          rts
                          00028
                          00029 ;
                          00030 ; Prepend a node to the list
                          00031 ;
 0209                     00032 Prepend:
 0209 20 0200         [6] 00033          jsr    Alloc     ; Allocate a new node
                          00034
 020C 94 00           [4] 00035          sty    0,X       ; Point next of new node to Root
                          00036
 020E 8A              [2] 00037          txa              ; This is now the first node
 020F A8              [2] 00038          tay
                          00039
 0210 A5 06           [3] 00040          lda    I         ; Store value in node
 0212 95 01           [4] 00041          sta    1,X
 0214 A5 07           [3] 00042          lda    I+1
 0216 95 02           [4] 00043          sta    2,X
                          00044
 0218 60              [6] 00045          rts
                          00046
                          00047 ;
                          00048 ; Traverse the list and sum the values
                          00049 ;
 0219                     00050 NoCarry:
 0219 B9 0000       [4/5] 00051          lda    0,Y       ; Point to the next node
                          00052
 021C F0 1A (0238)  [2/3] 00053          beq    Summed    ; Until end of list
                          00054
 021E A8              [2] 00055          tay
                          00056
 021F                     00057 Sum:
 021F 18              [2] 00058          clc              ; Add value of node to sum
 0220 A5 02           [3] 00059          lda    Total
 0222 79 0001       [4/5] 00060          adc    1,Y
 0225 85 02           [3] 00061          sta    Total
 0227 A5 03           [3] 00062          lda    Total+1
 0229 79 0002       [4/5] 00063          adc    2,Y
 022C 85 03           [3] 00064          sta    Total+1
 022E 90 E9 (0219)  [2/3] 00065          bcc    NoCarry
                          00066
 0230 E6 04           [5] 00067          inc    Total+2
                          00068
 0232 D0 E5 (0219)  [2/3] 00069          bne    NoCarry
                          00070
 0234 E6 05           [5] 00071          inc    Total+3
                          00072
 0236 D0 E1 (0219)  [2/3] 00073          bne    NoCarry   ; Always branches
                          00074
 0238                     00075 Summed:
 0238 60              [6] 00076          rts
                          00077
                          00078 ;
                          00079 ;
                          00080 ;
 0239                     00081 RepeatOuterLoop:
 0239 20 021F         [6] 00082          jsr    Sum       ; Traverse the list and add them up
                          00083
 023C A0 00           [2] 00084          ldy    #0        ; Clear list
                          00085
 023E A9 08           [2] 00086          lda    #Heap     ; Reset the heap
 0240 85 00           [3] 00087          sta    Free
                          00088
 0242 A9 4B           [2] 00089          lda    #75       ; Run inner loop 75 times
 0244 85 01           [3] 00090          sta    J
                          00091
 0246 D0 06 (024E)  [2/3] 00092          bne    InnerLoop ; Always branches
                          00093
 0248                     00094 NoBorrow:
 0248 C6 06           [5] 00095          dec    I
                          00096
 024A C6 01           [5] 00097          dec    J
 024C F0 EB (0239)  [2/3] 00098          beq    RepeatOuterLoop
                          00099
 024E                     00100 Main:
 024E                     00101 InnerLoop:
 024E 20 0209         [6] 00102          jsr    Prepend
                          00103
 0251 A5 06           [3] 00104          lda    I         ; Low byte of count zero?
 0253 D0 F3 (0248)  [2/3] 00105          bne    NoBorrow  ; No, no borrow
                          00106
 0255 C6 07           [5] 00107          dec    I+1       ; Borrow from upper byte
 0257 10 EF (0248)  [2/3] 00108          bpl    NoBorrow  ; Done if it goes negative
                          00109
 0259 20 021F         [6] 00110          jsr    Sum       ; Traverse the list and add them up
                          00111
 025C 00              [7] 00112          brk
                          00113
 024E                     00114          end    Main

What is good for the goose is good for the gander and the gosling. The door is now open for the others to exploit their zero pages in the same manner.

Unfortunately, the 6800 cannot load the index register from a single byte. Its minor gain does not save it from last place.

                          00004 * 366717 ($5987D) cycles
                          00005 * 288067 ($46543) cycles after inlining subroutines
                          00006
 0000 000D                00007 Free     fdb    Heap
 0002 0000                00008 Root     fdb    0
 0004 0000                00009 New      fdb    0
 0006 0000 0000           00010 Total    fdb    0,0
 000A 3C                  00011 J        fcb    60        ; Inner loop runs 60 times
 000B 0BB7                00012 I        fdb    2999
                          00013
 000D                     00014 Heap     rmb    4*60
                          00015
                          00016 *
                          00017 * Allocate a node
                          00018 *
 00FD                     00019 Alloc
 00FD D6 01           [3] 00020          ldab   Free+1    ; Allocate a node
 00FF D7 05           [4] 00021          stab   New+1     ; Return it in New
                          00022
 0101 CB 04           [2] 00023          addb   #4        ; Point to free memory
 0103 D7 01           [4] 00024          stab   Free+1
                          00025
 0105 39              [5] 00026          rts
                          00027
                          00028 *
                          00029 * Prepend a node to the list
                          00030 *
 0106                     00031 Prepend
 0106 8D F5 (00FD)    [8] 00032          bsr    Alloc     ; Allocate a new node
                          00033
 0108 DE 04           [4] 00034          ldx    New       ; Point to the new node
                          00035
 010A D6 03           [3] 00036          ldab   Root+1    ; Point next of new node to Root
 010C E7 01           [6] 00037          stab   1,X
 010E 6F 00           [7] 00038          clr    ,X
                          00039
 0110 A7 03           [6] 00040          staa   3,X       ; Store value in node
 0112 D6 0B           [3] 00041          ldab   I
 0114 E7 02           [6] 00042          stab   2,X
                          00043
 0116 DF 02           [5] 00044          stx    Root      ; This is now the first node
                          00045
 0118 39              [5] 00046          rts
                          00047
                          00048 *
                          00049 * Traverse the list and sum the values
                          00050 *
 0119                     00051 Sum
 0119 DE 02           [4] 00052          ldx    Root
 011B 96 08           [3] 00053          ldaa   Total+2
 011D D6 09           [3] 00054          ldab   Total+3
                          00055
 011F                     00056 SumLoop
 011F EB 03           [5] 00057          addb   3,X       ; Add value of node to sum
 0121 A9 02           [5] 00058          adca   2,X
 0123 24 08 (012D)    [4] 00059          bcc    NoCarry
                          00060
 0125 7C 0007         [6] 00061          inc    Total+1
                          00062
 0128 26 03 (012D)    [4] 00063          bne    NoCarry
                          00064
 012A 7C 0006         [6] 00065          inc    Total
                          00066
 012D                     00067 NoCarry
 012D EE 00           [6] 00068          ldx    ,X        ; Point to the next node
 012F 26 EE (011F)    [4] 00069          bne    SumLoop   ; Until end of list
                          00070
 0131 97 08           [4] 00071          staa   Total+2
 0133 D7 09           [4] 00072          stab   Total+3
                          00073
 0135 39              [5] 00074          rts
                          00075
                          00076 *
                          00077 *
                          00078 *
 0136                     00079 RepeatOuterLoop
 0136 97 0C           [4] 00080          staa   I+1       ; Save low byte of count
                          00081
 0138 8D DF (0119)    [8] 00082          bsr    Sum       ; Traverse the list and add them up
                          00083
 013A DF 02           [5] 00084          stx    Root      ; Clear list
                          00085
 013C 86 0D           [2] 00086          ldaa   #Heap     ; Reset the heap
 013E 97 01           [4] 00087          staa   Free+1
                          00088
 0140 86 3C           [2] 00089          ldaa   #60       ; Run inner loop 60 times
 0142 97 0A           [4] 00090          staa   J
                          00091
 0144                     00092 Main
 0144                     00093 InnerLoop
 0144 96 0C           [3] 00094          ldaa   I+1       ; Get low byte of node value
                          00095
 0146 20 06 (014E)    [4] 00096          bra    IntoLoop
                          00097
 0148                     00098 NoBorrow
 0148 4A              [2] 00099          deca
                          00100
 0149 7A 000A         [6] 00101          dec    J
 014C 27 E8 (0136)    [4] 00102          beq    RepeatOuterLoop
                          00103
 014E                     00104 IntoLoop
 014E 8D B6 (0106)    [8] 00105          bsr    Prepend
                          00106
 0150 4D              [2] 00107          tsta             ; Low byte of count zero?
 0151 26 F5 (0148)    [4] 00108          bne    NoBorrow  ; No, no borrow
                          00109
 0153 7A 000B         [6] 00110          dec    I         ; Borrow from upper byte
 0156 2A F0 (0148)    [4] 00111          bpl    NoBorrow  ; Done if it goes negative
                          00112
 0158 8D BF (0119)    [8] 00113          bsr    Sum       ; Traverse the list and add them up
                          00114
 015A 3F             [12] 00115          swi
                          00116
 0144                     00117          end    Main

Even though the 8080 architecture does not directly advantage the first 256 bytes of its memory space, single-byte addresses do present a huge opportunity in light of the clumsy addressing capabilities. Some tricky coding keeps the 8080 solidly in second place.

                          00004 ; 757274 (0B8E1Ah) cycles
                          00005 ; 594194 (091112h) cycles after inlining subroutines
                          00006
 0000 0000 0000           00007 Total   dw      0,0
 0004 0BB7                00008 I       dw      2999
                          00009
 0006                     00010 Heap    ds      3*75
                          00011
                          00012 ;
                          00013 ; Allocate a node
                          00014 ;
 00E7                     00015 Alloc:
 00E7 6B              [5] 00016         mov     L,E             ; Allocate a node to return in HL
                          00017
 00E8 1C              [5] 00018         inr     E               ; Point to free memory
 00E9 1C              [5] 00019         inr     E
 00EA 1C              [5] 00020         inr     E
                          00021
 00EB C9             [10] 00022         ret
                          00023
                          00024 ;
                          00025 ; Prepend a node to the list
                          00026 ;
 00EC                     00027 Prepend:
 00EC CD 00E7        [17] 00028         call    Alloc           ; Allocate a new node
                          00029
 00EF 72              [7] 00030         mov     M,D             ; Point next of new node to Root
                          00031
 00F0 55              [5] 00032         mov     D,L             ; This is the new first node
                          00033
 00F1 23              [5] 00034         inx     H               ; Store value in node
 00F2 71              [7] 00035         mov     M,C
 00F3 23              [5] 00036         inx     H
 00F4 3A 0005        [13] 00037         lda     I+1
 00F7 77              [7] 00038         mov     M,A
                          00039
 00F8 C9             [10] 00040         ret
                          00041
                          00042 ;
                          00043 ; Traverse the list and sum the values
                          00044 ;
 00F9                     00045 Sum:
 00F9 6A              [5] 00046         mov     L,D             ; Point HL to first node
 00FA EB              [4] 00047         xchg
 00FB 2A 0000        [16] 00048         lhld    Total           ; Load cumulative total
 00FE EB              [4] 00049         xchg
                          00050
 00FF                     00051 SumLoop:
 00FF 23              [5] 00052         inx     H               ; Add value of node to sum
 0100 4E              [7] 00053         mov     C,M
 0101 23              [5] 00054         inx     H
 0102 46              [7] 00055         mov     B,M
 0103 EB              [4] 00056         xchg
 0104 09             [10] 00057         dad     B
 0105 EB              [4] 00058         xchg
 0106 D2 010E        [10] 00059         jnc     NoCarry
                          00060
 0109 7D              [5] 00061         mov     A,L             ; Carry to high high word
 010A 2E 02           [7] 00062         mvi     L,Total+2
 010C 34             [10] 00063         inr     M
 010D 6F              [5] 00064         mov     L,A
                          00065
 010E                     00066 NoCarry:
 010E 2B              [5] 00067         dcx     H               ; Point to the next node
 010F 2B              [5] 00068         dcx     H
 0110 6E              [7] 00069         mov     L,M
                          00070
 0111 7D              [5] 00071         mov     A,L
 0112 B5              [4] 00072         ora     L               ; Until end of list
 0113 C2 00FF        [10] 00073         jnz     SumLoop
                          00074
 0116 EB              [4] 00075         xchg
 0117 22 0000        [16] 00076         shld    Total
 011A EB              [4] 00077         xchg                    ; Leave 0 in HL
                          00078
 011B C9             [10] 00079         ret
                          00080
                          00081 ;
                          00082 ;
                          00083 ;
 011C                     00084 Main:
 011C 26 00           [7] 00085         mvi     H,0
                          00086
 011E 3A 0004        [13] 00087         lda     I
 0121 4F              [5] 00088         mov     C,A
                          00089
 0122 C3 012A        [10] 00090         jmp     IntoOuterLoop
                          00091
 0125                     00092 RepeatOuterLoop:
 0125 C5             [11] 00093         push    B
                          00094
 0126 CD 00F9        [17] 00095         call    Sum             ; Traverse the list and add them up
                          00096
 0129 C1             [10] 00097         pop     B
                          00098
 012A                     00099 IntoOuterLoop:
 012A 06 4B           [7] 00100         mvi     B,75
                          00101
 012C 11 0006        [10] 00102         lxi     D,Heap          ; Reset the heap and clear list
                          00103
 012F C3 0137        [10] 00104         jmp     IntoLoop
                          00105
 0132                     00106 NoBorrow:
 0132 0D              [5] 00107         dcr     C
                          00108
 0133 05              [5] 00109         dcr     B
 0134 CA 0125        [10] 00110         jz      RepeatOuterLoop
                          00111
 0137                     00112 IntoLoop:
 0137 CD 00EC        [17] 00113         call    Prepend
                          00114
 013A 79              [5] 00115         mov     A,C             ; Low byte of count zero?
 013B B1              [4] 00116         ora     C
 013C C2 0132        [10] 00117         jnz     NoBorrow        ; No, no borrow
                          00118
 013F 2E 05           [7] 00119         mvi     L,I+1           ; Borrow from upper byte
 0141 35             [10] 00120         dcr     M
 0142 F2 0132        [10] 00121         jp      NoBorrow        ; Done if it goes negative
                          00122
 0145 CD 00F9        [17] 00123         call    Sum             ; Traverse the list and add them up
                          00124
 0148 76              [7] 00125         hlt
                          00126
 011C                     00127         end     Main
1 Like

A change in register usage narrows the gap but does not alter the standings.

                          00007 * 360323 ($57F83) cycles
                          00008 * 281673 ($44C49) cycles after inlining subroutines
                          00009
 0000 000D                00010 Free     fdb    Heap
 0002 0000                00011 Root     fdb    0
 0004 0000                00012 New      fdb    0
 0006 0000 0000           00013 Total    fdb    0,0
 000A 3C                  00014 J        fcb    60        ; Inner loop runs 60 times
 000B 0BB7                00015 I        fdb    2999
                          00016
 000D                     00017 Heap     rmb    4*60
                          00018
                          00019 *
                          00020 * Allocate a node
                          00021 *
 00FD                     00022 Alloc
 00FD 97 05           [4] 00023          staa   New+1     ; Return it in New
                          00024
 00FF 8B 04           [2] 00025          adda   #4        ; Point to free memory
                          00026
 0101 39              [5] 00027          rts
                          00028
                          00029 *
                          00030 * Prepend a node to the list
                          00031 *
 0102                     00032 Prepend
 0102 8D F9 (00FD)    [8] 00033          bsr    Alloc     ; Allocate a new node
                          00034
 0104 DE 04           [4] 00035          ldx    New       ; Point to the new node
                          00036
 0106 E7 03           [6] 00037          stab   3,X       ; Store value in node
 0108 D6 0B           [3] 00038          ldab   I
 010A E7 02           [6] 00039          stab   2,X
                          00040
 010C D6 03           [3] 00041          ldab   Root+1    ; Point next of new node to Root
 010E E7 01           [6] 00042          stab   1,X
 0110 6F 00           [7] 00043          clr    ,X
                          00044
 0112 DF 02           [5] 00045          stx    Root      ; This is now the first node
                          00046
 0114 39              [5] 00047          rts
                          00048
                          00049 *
                          00050 * Traverse the list and sum the values
                          00051 *
 0115                     00052 Sum
 0115 DE 02           [4] 00053          ldx    Root
 0117 96 08           [3] 00054          ldaa   Total+2
 0119 D6 09           [3] 00055          ldab   Total+3
                          00056
 011B                     00057 SumLoop
 011B EB 03           [5] 00058          addb   3,X       ; Add value of node to sum
 011D A9 02           [5] 00059          adca   2,X
 011F 24 08 (0129)    [4] 00060          bcc    NoCarry
                          00061
 0121 7C 0007         [6] 00062          inc    Total+1
                          00063
 0124 26 03 (0129)    [4] 00064          bne    NoCarry
                          00065
 0126 7C 0006         [6] 00066          inc    Total
                          00067
 0129                     00068 NoCarry
 0129 EE 00           [6] 00069          ldx    ,X        ; Point to the next node
 012B 26 EE (011B)    [4] 00070          bne    SumLoop   ; Until end of list
                          00071
 012D 97 08           [4] 00072          staa   Total+2
 012F D7 09           [4] 00073          stab   Total+3
                          00074
 0131 39              [5] 00075          rts
                          00076
                          00077 *
                          00078 *
                          00079 *
 0132                     00080 RepeatOuterLoop
 0132 8D E1 (0115)    [8] 00081          bsr    Sum       ; Traverse the list and add them up
                          00082
 0134 DF 02           [5] 00083          stx    Root      ; Clear list
                          00084
 0136 86 3C           [2] 00085          ldaa   #60       ; Run inner loop 60 times
 0138 97 0A           [4] 00086          staa   J
                          00087
 013A                     00088 Main
 013A 86 0D           [2] 00089          ldaa   #Heap     ; Reset the heap
                          00090
 013C D6 0C           [3] 00091          ldab   I+1       ; Load low byte of count
                          00092
 013E 20 08 (0148)    [4] 00093          bra    IntoLoop
                          00094
 0140                     00095 NoBorrow
 0140 5A              [2] 00096          decb
 0141 D7 0C           [4] 00097          stab   I+1
                          00098
 0143 7A 000A         [6] 00099          dec    J
 0146 27 EA (0132)    [4] 00100          beq    RepeatOuterLoop
                          00101
 0148                     00102 IntoLoop
 0148 8D B8 (0102)    [8] 00103          bsr    Prepend
                          00104
 014A D6 0C           [3] 00105          ldab   I+1       ; Low byte of count zero?
 014C 26 F2 (0140)    [4] 00106          bne    NoBorrow  ; No, no borrow
                          00107
 014E 7A 000B         [6] 00108          dec    I         ; Borrow from upper byte
 0151 2A ED (0140)    [4] 00109          bpl    NoBorrow  ; Done if it goes negative
                          00110
 0153 8D C0 (0115)    [8] 00111          bsr    Sum       ; Traverse the list and add them up
                          00112
 0155 3F             [12] 00113          swi
                          00114
 013A                     00115          end    Main
1 Like

I have been working on the heap manager for the ARM (Raspberry Pi) lately.

Many of the instructions are three operand:

add <destination>,<source 1>,<source 2>

Sometimes it is useful to leave the source operands unaltered.

The assembler allows shortening that to:

add <destination>,<source>

when one of the source operands is also the destination.

Of course assembly language programming involves the use of a debugger. On the Raspberry Pi, that would be GDB.

GDB commands can be very wordy.

Fortunately, many can be abbreviated nicely:

r for run
c for continue
i r for info register
si for step a single machine instruction
ni for step over a subroutine call

GDB is also nice in that it provides command history and just hitting <Enter> repeats the last command.

1 Like

It is December and time again for the annual Vintage Computing Christmas Challenge:

I have entered in the past, but did not last year because I found the challenge, shall we say, uninspiring.

As usual, I will begin by coding up a solution in 6800 assembly language. Then go to the 6809 to see if that can improve it. Finally create 6502 and 8080 versions for comparison.

I do not like that the judging criteria are:

The shortest size of the following provided sizes is taken for the final “ordering/ranking”:

  • Source code
  • File size
  • Executed Code (net[to] code size; “real” code)

So I will be submitting both commented and uncommented source code…