README.md

   1 # Building and running
   2
   3 Create the executable:
   4
   5     $ make main
   6
   7 The `sys.f` file contains code that defines some of the usual words that you
   8 would expect in a Forth distribution. To run this code and then read from
   9 standard input, run:
  10
  11     $ cat sys.f - | ./main
  12
  13 The `example.f` file contains an example that you can run with:
  14
  15     $ cat sys.f example.f | ./main
  16
  17 ## Running with UEFI
  18
  19 We are currently in the process of implementing support for running without
  20 Linux, by instead relying on UEFI. Eventually, this will be the only supported
  21 method of running the interpreter, but currently the UEFI-related code is
  22 isolated in the `uefi/` directory and does not yet contain an implementation of
  23 the main program.
  24
  25 You should have the following dependencies installed (assuming Arch Linux):
  26
  27     $ pacman -S qemu ovmf
  28
  29 To run a UEFI shell inside qemu, cd to `uefi/` and run:
  30
  31     $ make run
  32
  33 ### Running on read hardware
  34
  35 * [ ] This is not supported yet
  36
  37 # Notes on implementation
  38
  39 This is my summary of the most important parts of
  40 https://raw.githubusercontent.com/nornagon/jonesforth/master/jonesforth.S.
  41
  42 ## Dictionary
  43
  44 In Forth, words are stored in a dictionary. The dictionary is a linked list
  45 whose entries look like this:
  46
  47     +------------------------+--------+---------- - - - - +----------- - - - -
  48     | LINK POINTER           | LENGTH/| NAME              | DEFINITION
  49     |                        | FLAGS  |                   |
  50     +--- (4 bytes) ----------+- byte -+- n bytes  - - - - +----------- - - - -
  51
  52 For example, DOUBLE and QUADRUPLE may be stored like this:
  53
  54       pointer to previous word
  55        ^
  56        |
  57     +--|------+---+---+---+---+---+---+---+---+------------- - - - -
  58     | LINK    | 6 | D | O | U | B | L | E | 0 | (definition ...)
  59     +---------+---+---+---+---+---+---+---+---+------------- - - - -
  60        ^       len                         padding
  61        |
  62     +--|------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
  63     | LINK    | 9 | Q | U | A | D | R | U | P | L | E | 0 | 0 | (definition ...)
  64     +---------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
  65        ^       len                                     padding
  66        |
  67        |
  68     LATEST
  69
  70 The Forth variable LATEST contains a pointer to the most recently defined word.
  71
  72 ## Threaded code
  73
  74 In a typical Forth interpreter, code is stored in a peculiar way. (This way of
  75 storing code is primarily motivated by space contraints on early systems.)
  76
  77 The definition of a word is stored as a sequence of memory adresses of each of
  78 the words making up that definition. (At the end of a compiled definition, there
  79 is also some extra code that causes execution to continue in the correct way.)
  80
  81 We use a register (ESI) to store a reference to the next index of the
  82 word (inside a definition) that we are executing. Then, in order to execute a
  83 word, we just jump to whatever address is pointed to by ESI. The code for
  84 updating ESI and continuing execution is stored at the end of each subroutine.
  85
  86 Of course, this approach only works if each of the words that we are executing
  87 is defined in assembly, but we also want to be able to execute Forth words!
  88
  89 We get around this problem by adding a "codeword" to the beginning of any
  90 compiled subroutine. This codeword is a pointer to the intrepreter to run the
  91 given function. In order to run such functions, we actually need two jumps when
  92 executing: In order to execute a word, we jump to the address at the location
  93 pointed to by the address in ESI.
  94
  95 ## Definitions
  96
  97 What does the codeword of a Forth word contain? It needs to save the old value
  98 of ESI (so that we can resume execution of whatever outer definition we are
  99 executing at the time) and set the new version of ESI to point to the first word
 100 in the inner definition.
 101
 102 The stack where the values of ESI are stored is called the "return stack". We
 103 will use EBP for the return stack.
 104
 105 As mentioned, whenever we finish executing a Forth word, we will need to
 106 continue execution in the manner described in the previous section. When the
 107 word being executed is itself written in Forth, we need to pop the old value of
 108 ESI that we saved at the beginning of the definition before doing this.
 109
 110 Thus, the actual data for a word in a dictionary will look something like this:
 111
 112       pointer to previous word
 113        ^
 114        |
 115     +--|------+---+---+---+---+---+---+---+---+------------+------------+------------+------------+
 116     | LINK    | 6 | D | O | U | B | L | E | 0 | DOCOL      | DUP        | +          | EXIT       |
 117     +---------+---+---+---+---+---+---+---+---+------------+--|---------+------------+------------+
 118        ^       len                         pad  codeword      |
 119        |                                                      V
 120       LINK in next word                            points to codeword of DUP
 121
 122 Here, DOCOL (the codeword) is address of the simple interpreter described above,
 123 while EXIT a word (implemented in assembly) that takes care of popping ESI and
 124 continuing execution. Note that DOCOL, DUP, + and EXIT are all stored as
 125 addresses which point to codewords.
 126
 127 ## Literals
 128
 129 Literals are handled in a special way. There is a word in Forth, called LIT,
 130 implemented in assembly. When executed, this word looks at the next Forth
 131 instruction (i.e. the value of ESI), and places that on the stack as a literal,
 132 and then manipulates ESI to skip over the literal value.
 133
 134 ## Built-in variables
 135
 136 * **STATE** -- Is the interpreter executing code (0) or compiling a word (non-zero)?
 137 * **LATEST** -- Points to the latest (most recently defined) word in the dictionary.
 138 * **HERE** -- Points to the next free byte of memory.  When compiling, compiled words go here.
 139 * **S0** -- Stores the address of the top of the parameter stack.
 140 * **BASE** -- The current base for printing and reading numbers.
 141
 142 ## Input and lookup
 143
 144 `WORD` reads a word from standard input and pushes a string (in the form of an
 145 address followed by the length of the string) to the stack. (It uses an internal
 146 buffer that is overwritten each time it is called.)
 147
 148 `FIND` takes a word as parsed by `WORD` and looks it up in the dictionary. It
 149 returns the address of the dictionary header of that word if it is found.
 150 Otherwise, it returns 0.
 151
 152 `>CFA` turns a dictionary pointer into a codeword pointer. This is used when
 153 compiling.
 154
 155 ## Compilation
 156
 157 The Forth word INTERPRET runs in a loop, reading in words (with WORD), looking
 158 them up (with FIND), turning them into codeword pointers (with >CFA) and then
 159 deciding what to do with them.
 160
 161 In immediate mode (when STATE is zero), the word is simply executed immediately.
 162
 163 In compilation mode, INTERPRET appends the codeword pointer to user memory
 164 (which is at HERE). However, if a word has the immediate flag set, then it is
 165 run immediately, even in compile mode.
 166
 167 ### Definition of `:` and `;`
 168
 169 The word `:` starts by reading in the new word. Then it creates a new entry for
 170 that word in the dictoinary, updating the contents of `LATEST`, to which it
 171 appends the word `DOCOL`. Then, it switches to compile mode.
 172
 173 The word `;` simply appends `EXIT` to the currently compiling definition and
 174 then switches back to immediate mode.
 175
 176 These words rely on `,` to append words to the currently compiling definition.
 177 This word simply appends some literal value to `HERE` and moves the `HERE`
 178 pointer forward.
 179
 180 # Notes on UEFI
 181
 182 `JONASFORTH` is runs without an operating system, instead using the facilities
 183 provided by UEFI by running as a UEFI application. (Or rather, in the future it
 184 hopefully will. Right now, it uses Linux.) This section contains some notes
 185 about how this functionality is implemented.
 186
 187 ## Packaging and testing the image
 188
 189 UEFI expects a UEFI application to be stored in a FAT32 file system on a
 190 GPT-partitioned disk.
 191
 192 Luckily, QEMU has a convenient way of making a subdirectory availabe as a
 193 FAT-formatted disk (see [the relevant section in the QEMU User
 194 Documentation](https://qemu.weilnetz.de/doc/qemu-doc.html#disk_005fimages_005ffat_005fimages)
 195 for more information):
 196
 197     $ qemu-sytem-x86_64 ... -hda fat:/some/directory
 198
 199 We use this to easily test the image in QEMU; see the Makefile for more information.
 200
 201 * [ ] How to build the image for real hardware (what should the image look like,
 202   which programs, commands, etc.)
 203
 204 ## Interfacing with UEFI
 205
 206 From [OSDev Wiki](https://wiki.osdev.org/UEFI#How_to_use_UEFI):
 207
 208 >Traditional operating systems like Windows and Linux have an existing software
 209 >architecture and a large code base to perform system configuration and device
 210 >discovery. With their sophisticated layers of abstraction they don't directly
 211 >benefit from UEFI. As a result, their UEFI bootloaders do little but prepare
 212 >the environment for them to run.
 213 >
 214 >An independent developer may find more value in using UEFI to write
 215 >feature-full UEFI applications, rather than viewing UEFI as a temporary
 216 >start-up environment to be jettisoned during the boot process. Unlike legacy
 217 >bootloaders, which typically interact with BIOS only enough to bring up the OS,
 218 >a UEFI application can implement sophisticated behavior with the help of UEFI.
 219 >In other words, an independent developer shouldn't be in a rush to leave
 220 >"UEFI-land".
 221
 222 For `JONASFORTH`, I have decided to run as a UEFI application, taking advantage
 223 of UEFI's features, including its text I/O features and general graphical device
 224 drivers. Eventually, we would like to add some basic graphical drawing
 225 capabilities to `JONASFORTH`, and it's my impression that this would be possible
 226 using what is provided to us by UEFI.
 227
 228 A UEFI images is basically a windows EXE without symbol tables. There are three
 229 types of UEFI images; we use the EFI application, which has subsystem `10`. It
 230 is an x68-64 image, which has value `0x8664`.
 231
 232 UEFI applications use [Microsoft's 64-bit calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention) for x68-64 functions. See the linked article for a full description. Here is the short version:
 233
 234 * Integer or pointer arguments are given in RCX, RDX, R8 and R9.
 235 * Additional arguments are pushed onto the stack from right to left.
 236 * Integer or pointer values are returned in RAX.
 237 * An integer-sized struct is passed directly; non-integer-sized structs are passed as pointers.
 238 * The caller must allocate 32 bytes of "shadow space" on the stack immediately
 239   before calling the function, regardless of the number of parameters used, and
 240   the caller is responsible for popping the stack afterwards.
 241 * The following registers are volatile (caller-saved): RAX, RCX, RDX, R8, R9, R10, R11
 242 * The following registers are nonvolatile (callee-saved): RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15
 243
 244 When the application is loaded, RCX contains a firmware allocated `EFI_HANDLE`
 245 for the UEFI image, RDX contains a `EFI_SYSTEM_TABLE*` pointer to the EFI system
 246 table and RSP contains the return address. For more infromation about how a UEFI
 247 application is entered, see "4 - EFI System Table" in [the latest UEFI
 248 specification as of March 2020 (PDF)](https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_A_Feb14.pdf).
 249
 250 **Sources:**
 251
 252 * [UEFI applications in detail - OSDev Wiki](https://wiki.osdev.org/UEFI#UEFI_applications_in_detail)
 253 * [Microsoft x64 calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention)
 254 * [UEFI Specifications](https://uefi.org/specifications)
 255
 256 ### UEFI with FASM
 257
 258 We might want to consider using something like this: https://wiki.osdev.org/Uefi.inc)
 259
 260 FASM can generate UEFI application binaries by default. Use the following
 261 template to output a 64-bit UEFI application:
 262
 263     format pe64 dll efi
 264     entry main
 265
 266     section '.text' code executable readable
 267
 268     main:
 269        ;; ...
 270        ret
 271
 272     section '.data' data readable writable
 273
 274     ;; ...
 275
 276 Use `objdump -x` to inspect the assembled application binary.
 277
 278 ### UEFI documentation
 279
 280 * [Latest specification as of March 2020 (PDF)](https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_A_Feb14.pdf)
 281
 282 Notable sections:
 283
 284 * 2\. Overview (14)
 285 * 4\. EFI System Table (89)
 286 * 7\. Services - Boot Services (140)
 287 * 8\. Services - Runtime Services (228)
 288 * 12\. Protocols - Console Support (429)
 289 * 13\. Protocols - Media Access (493)
 290 * Appendix B - Console (2201)
 291 * Appendix D - Status Codes (2211)
 292
 293 ## Resources
 294
 295 * [UEFI - OSDev Wiki](https://wiki.osdev.org/UEFI)
 296 * [Unified Extensible Firmware Interface (Wikipedia)](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface)
 297 * [UEFI Specifications](https://uefi.org/specifications)