README.md

   1 # Building and running
   2
   3 Create the executable:
   4
   5     $ make main
   6
   7 The `sys.f` file contains code that defines some of the usual words that you
   8 would expect in a Forth distribution. To run this code and then read from
   9 standard input, run:
  10
  11     $ cat sys.f - | ./main
  12
  13 The `example.f` file contains an example that you can run with:
  14
  15     $ cat sys.f example.f | ./main
  16
  17 ## Running with UEFI
  18
  19 We are currently in the process of implementing support for running without
  20 Linux, by instead relying on UEFI. Eventually, this will be the only supported
  21 method of running the interpreter, but currently the UEFI-related code is
  22 isolated in the `uefi/` directory and does not yet contain an implementation of
  23 the main program.
  24
  25 You should have the following dependencies installed (assuming Arch Linux):
  26
  27     $ pacman -S qemu ovmf
  28
  29 To run a UEFI shell inside qemu, cd to `uefi/` and run:
  30
  31     $ make run
  32
  33 # Notes on implementation
  34
  35 This is my summary of the most important parts of
  36 https://raw.githubusercontent.com/nornagon/jonesforth/master/jonesforth.S.
  37
  38 ## Dictionary
  39
  40 In Forth, words are stored in a dictionary. The dictionary is a linked list
  41 whose entries look like this:
  42
  43     +------------------------+--------+---------- - - - - +----------- - - - -
  44     | LINK POINTER           | LENGTH/| NAME              | DEFINITION
  45     |                        | FLAGS  |                   |
  46     +--- (4 bytes) ----------+- byte -+- n bytes  - - - - +----------- - - - -
  47
  48 For example, DOUBLE and QUADRUPLE may be stored like this:
  49
  50       pointer to previous word
  51        ^
  52        |
  53     +--|------+---+---+---+---+---+---+---+---+------------- - - - -
  54     | LINK    | 6 | D | O | U | B | L | E | 0 | (definition ...)
  55     +---------+---+---+---+---+---+---+---+---+------------- - - - -
  56        ^       len                         padding
  57        |
  58     +--|------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
  59     | LINK    | 9 | Q | U | A | D | R | U | P | L | E | 0 | 0 | (definition ...)
  60     +---------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
  61        ^       len                                     padding
  62        |
  63        |
  64     LATEST
  65
  66 The Forth variable LATEST contains a pointer to the most recently defined word.
  67
  68 ## Threaded code
  69
  70 In a typical Forth interpreter, code is stored in a peculiar way. (This way of
  71 storing code is primarily motivated by space contraints on early systems.)
  72
  73 The definition of a word is stored as a sequence of memory adresses of each of
  74 the words making up that definition. (At the end of a compiled definition, there
  75 is also some extra code that causes execution to continue in the correct way.)
  76
  77 We use a register (ESI) to store a reference to the next index of the
  78 word (inside a definition) that we are executing. Then, in order to execute a
  79 word, we just jump to whatever address is pointed to by ESI. The code for
  80 updating ESI and continuing execution is stored at the end of each subroutine.
  81
  82 Of course, this approach only works if each of the words that we are executing
  83 is defined in assembly, but we also want to be able to execute Forth words!
  84
  85 We get around this problem by adding a "codeword" to the beginning of any
  86 compiled subroutine. This codeword is a pointer to the intrepreter to run the
  87 given function. In order to run such functions, we actually need two jumps when
  88 executing: In order to execute a word, we jump to the address at the location
  89 pointed to by the address in ESI.
  90
  91 ## Definitions
  92
  93 What does the codeword of a Forth word contain? It needs to save the old value
  94 of ESI (so that we can resume execution of whatever outer definition we are
  95 executing at the time) and set the new version of ESI to point to the first word
  96 in the inner definition.
  97
  98 The stack where the values of ESI are stored is called the "return stack". We
  99 will use EBP for the return stack.
 100
 101 As mentioned, whenever we finish executing a Forth word, we will need to
 102 continue execution in the manner described in the previous section. When the
 103 word being executed is itself written in Forth, we need to pop the old value of
 104 ESI that we saved at the beginning of the definition before doing this.
 105
 106 Thus, the actual data for a word in a dictionary will look something like this:
 107
 108       pointer to previous word
 109        ^
 110        |
 111     +--|------+---+---+---+---+---+---+---+---+------------+------------+------------+------------+
 112     | LINK    | 6 | D | O | U | B | L | E | 0 | DOCOL      | DUP        | +          | EXIT       |
 113     +---------+---+---+---+---+---+---+---+---+------------+--|---------+------------+------------+
 114        ^       len                         pad  codeword      |
 115        |                                                      V
 116       LINK in next word                            points to codeword of DUP
 117
 118 Here, DOCOL (the codeword) is address of the simple interpreter described above,
 119 while EXIT a word (implemented in assembly) that takes care of popping ESI and
 120 continuing execution. Note that DOCOL, DUP, + and EXIT are all stored as
 121 addresses which point to codewords.
 122
 123 ## Literals
 124
 125 Literals are handled in a special way. There is a word in Forth, called LIT,
 126 implemented in assembly. When executed, this word looks at the next Forth
 127 instruction (i.e. the value of ESI), and places that on the stack as a literal,
 128 and then manipulates ESI to skip over the literal value.
 129
 130 ## Built-in variables
 131
 132 * **STATE** -- Is the interpreter executing code (0) or compiling a word (non-zero)?
 133 * **LATEST** -- Points to the latest (most recently defined) word in the dictionary.
 134 * **HERE** -- Points to the next free byte of memory.  When compiling, compiled words go here.
 135 * **S0** -- Stores the address of the top of the parameter stack.
 136 * **BASE** -- The current base for printing and reading numbers.
 137
 138 ## Input and lookup
 139
 140 `WORD` reads a word from standard input and pushes a string (in the form of an
 141 address followed by the length of the string) to the stack. (It uses an internal
 142 buffer that is overwritten each time it is called.)
 143
 144 `FIND` takes a word as parsed by `WORD` and looks it up in the dictionary. It
 145 returns the address of the dictionary header of that word if it is found.
 146 Otherwise, it returns 0.
 147
 148 `>CFA` turns a dictionary pointer into a codeword pointer. This is used when
 149 compiling.
 150
 151 ## Compilation
 152
 153 The Forth word INTERPRET runs in a loop, reading in words (with WORD), looking
 154 them up (with FIND), turning them into codeword pointers (with >CFA) and then
 155 deciding what to do with them.
 156
 157 In immediate mode (when STATE is zero), the word is simply executed immediately.
 158
 159 In compilation mode, INTERPRET appends the codeword pointer to user memory
 160 (which is at HERE). However, if a word has the immediate flag set, then it is
 161 run immediately, even in compile mode.
 162
 163 ### Definition of `:` and `;`
 164
 165 The word `:` starts by reading in the new word. Then it creates a new entry for
 166 that word in the dictoinary, updating the contents of `LATEST`, to which it
 167 appends the word `DOCOL`. Then, it switches to compile mode.
 168
 169 The word `;` simply appends `EXIT` to the currently compiling definition and
 170 then switches back to immediate mode.
 171
 172 These words rely on `,` to append words to the currently compiling definition.
 173 This word simply appends some literal value to `HERE` and moves the `HERE`
 174 pointer forward.
 175
 176 # Notes on UEFI
 177
 178 `JONASFORTH` is runs without an operating system, instead using the facilities
 179 provided by UEFI by running as a UEFI application. (Or rather, in the future it
 180 hopefully will. Right now, it uses Linux.) This section contains some notes
 181 about how this functionality is implemented.
 182
 183 ## Packaging and testing the image
 184
 185 * [ ] What should the image look like?
 186 * [ ] How to build the image (which programs, commands, etc.)
 187 * [ ] How do we run the application in QEMU
 188
 189 ## Interfacing with UEFI
 190
 191 From [OSDev Wiki](https://wiki.osdev.org/UEFI#How_to_use_UEFI):
 192
 193 >Traditional operating systems like Windows and Linux have an existing software
 194 >architecture and a large code base to perform system configuration and device
 195 >discovery. With their sophisticated layers of abstraction they don't directly
 196 >benefit from UEFI. As a result, their UEFI bootloaders do little but prepare
 197 >the environment for them to run.
 198 >
 199 >An independent developer may find more value in using UEFI to write
 200 >feature-full UEFI applications, rather than viewing UEFI as a temporary
 201 >start-up environment to be jettisoned during the boot process. Unlike legacy
 202 >bootloaders, which typically interact with BIOS only enough to bring up the OS,
 203 >a UEFI application can implement sophisticated behavior with the help of UEFI.
 204 >In other words, an independent developer shouldn't be in a rush to leave
 205 >"UEFI-land".
 206
 207 For `JONASFORTH`, I have decided to run as a UEFI application, taking advantage
 208 of UEFI's features, including its text I/O features and general graphical device
 209 drivers. Eventually, we would like to add some basic graphical drawing
 210 capabilities to `JONASFORTH`, and it's my impression that this would be possible
 211 using what is provided to us by UEFI.
 212
 213 * [ ] How to register as a UEFI application
 214 * [ ] How to use UEFI provided functions
 215
 216 ## Resources
 217
 218 * https://wiki.osdev.org/UEFI
 219 * https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface