How else to spend a rainy day than study the source code of my FORTH interpreter and BASIC compiler for the AVR microcontroller?
Most of you will not know this since Allen is the only other person I am aware of who will admit to programming an AVR in assembly language.
The AVR has what is known as a Harvard architecture. That is, it has separate code and data memory address spaces. It is a natural since the code is stored in flash memory while the data is in conventional RAM.
The memory maps of an AVR look like this:
0 : data address space
*------------------------------*
| Registers and I/O | free RAM |
*------------------------------*
|<-- 2K -->|
0 : code address space
*-------------------------------------------*
| Interrupt vectors | program code | unused |
*-------------------------------------------*
|<------------------ 32K ------------------>|
The problem is that the AVR 328p as used in an Arduino has only 2 KBytes of RAM.
FORTH, BASIC and Python all have a large amount of data which is not changed as the program runs.
The obvious solution is to put it in among the 32 KBytes of flash memory. The problem is that there are two different flavors of pointers, data and code.
The GCC tool set used in the Arduino IDE addresses the problem with a somewhat hairy set of compiler directives. The gory details if you are bored and care to read about it:
http://www.nongnu.org/avr-libc/user-manual/pgmspace.html
FORTH, BASIC and Python have no provision for dealing with this problem.
Since I have complete control over memory layout, I have come up with a way to give the illusion of a common pointer. My modified memory map looks like this:
0 : data address space
*------------------------------*
| Registers and I/O | free RAM |
*------------------------------*
|<-- 2K -->|
0 : code address space
*----------------------------------------------------------------------------*
| Interrupt vectors | program code | optional padding | static data | unused |
*----------------------------------------------------------------------------*
|<----------------------------------- 32K ---------------------------------->|
As long as the static data is at a higher address than the end of the RAM, no additional padding is necessary; this is the case except for very simple programs.
Code to access data must distinguish between RAM and flash memory by comparing an address with the end of RAM, somewhat like this:
000090 2F6E [1] 00107 PStr: mov R22,R30
000091 2B6F [1] 00108 or R22,R31
000092 F099=0000A6 [1/2] 00109 breq PStr2
000093 E161 [1] 00110 ldi R22,high(SRAM_START+SRAM_SIZE)
000094 30E0 [1] 00111 cpi R30,low(SRAM_START+SRAM_SIZE)
000095 07F6 [1] 00112 cpc R31,R22
000096 F480=0000A7 [1/2] 00113 brcc PStr3
000097 91C1 [2] 00114 ld R28,Z+
000098 91D1 [2] 00115 ld R29,Z+
000099 9181 [2] 00116 ld R24,Z+
00009A 8190 [1] 00117 ld R25,Z
00009B 2F68 [1] 00118 mov R22,R24
00009C 2B69 [1] 00119 or R22,R25
00009D F041=0000A6 [1/2] 00120 breq PStr2
00009E E161 [1] 00121 ldi R22,high(SRAM_START+SRAM_SIZE)
00009F 30C0 [1] 00122 cpi R28,low(SRAM_START+SRAM_SIZE)
0000A0 07D6 [1] 00123 cpc R29,R22
0000A1 F460=0000AE [1/2] 00124 brcc PStr4
0000A2 9169 [2] 00125 PStr1: ld R22,Y+
0000A3 DFE5=000089 [3] 00126 rcall Echo
0000A4 9701 [2] 00127 sbiw R24,1
0000A5 F7E1=0000A2 [1/2] 00128 brne PStr1
0000A6 9508 [2] 00129 PStr2: ret
0000A7 91C5 [3] 00130 PStr3: lpm R28,Z+
0000A8 91D5 [3] 00131 lpm R29,Z+
0000A9 9185 [3] 00132 lpm R24,Z+
0000AA 9194 [3] 00133 lpm R25,Z
0000AB 2F68 [1] 00134 mov R22,R24
0000AC 2B69 [1] 00135 or R22,R25
0000AD F3C1=0000A6 [1/2] 00136 breq PStr2
0000AE 01FE [1] 00137 PStr4: movw R30,R28
0000AF 9165 [3] 00138 PStr5: lpm R22,Z+
0000B0 DFD8=000089 [3] 00139 rcall Echo
0000B1 9701 [2] 00140 sbiw R24,1
0000B2 F7E1=0000AF [1/2] 00141 brne PStr5
0000B3 9508 [2] 00142 ret
ld loads from data space whereas lpm loads from program space.