X-Git-Url: https://git.rrq.au/?a=blobdiff_plain;f=notes;h=69065509654742c56bc3c3f2a54ea5d7391169cd;hb=1da54a4e8c55400181194a2db2cce4b96e479d61;hp=c23bee740dcee51e2d91f95f1c6691baa0c5d68d;hpb=004f684b103b49f38d28d4282624c09975688b99;p=rrq%2Fjonasforth.git diff --git a/notes b/notes index c23bee7..6906550 100644 --- a/notes +++ b/notes @@ -8,6 +8,9 @@ JONESFORTH: # Notes on implementation +This is my summary of the most important parts of +https://raw.githubusercontent.com/nornagon/jonesforth/master/jonesforth.S. + ## Dictionary In Forth, words are stored in a dictionary. The dictionary is a linked list whose entries look like this: @@ -33,3 +36,64 @@ For example, DOUBLE and QUADRUPLE may be stored like this: LATEST The Forth variable LATEST contains a pointer to the most recently defined word. +## Threaded code + +In a typical Forth interpreter, code is stored in a peculiar way. (This way of +storing code is primarily motivated by space contraints on early systems.) + +The definition of a word is stored as a sequence of memory adresses of each of +the words making up that definition. (At the end of a compiled definition, there +is also some extra code that causes execution to continue in the correct way.) + +We use a register (ESI) to store a reference to the next index of the +word (inside a definition) that we are executing. Then, in order to execute a +word, we just jump to whatever address is pointed to by ESI. The code for +updating ESI and continuing execution is stored at the end of each subroutine. + +Of course, this approach only works if each of the words that we are executing +is defined in assembly, but we also want to be able to execute Forth words! + +We get around this problem by adding a "codeword" to the beginning of any +compiled subroutine. This codeword is a pointer to the intrepreter to run the +given function. In order to run such functions, we actually need two jumps when +executing: In order to execute a word, we jump to the address at the location +pointed to by the address in ESI. + +## Definitions + +What does the codeword of a Forth word contain? It needs to save the old value +of ESI (so that we can resume execution of whatever outer definition we are +executing at the time) and set the new version of ESI to point to the first word +in the inner definition. + +The stack where the values of ESI are stored is called the "return stack". We +will use EBP for the return stack. + +As mentioned, whenever we finish executing a Forth word, we will need to +continue execution in the manner described in the previous section. When the +word being executed is itself written in Forth, we need to pop the old value of +ESI that we saved at the beginning of the definition before doing this. + +Thus, the actual data for a word in a dictionary will look something like this: + + pointer to previous word + ^ + | + +--|------+---+---+---+---+---+---+---+---+------------+------------+------------+------------+ + | LINK | 6 | D | O | U | B | L | E | 0 | DOCOL | DUP | + | EXIT | + +---------+---+---+---+---+---+---+---+---+------------+--|---------+------------+------------+ + ^ len pad codeword | + | V + LINK in next word points to codeword of DUP + +Here, DOCOL (the codeword) is address of the simple interpreter described above, +while EXIT a word (implemented in assembly) that takes care of popping ESI and +continuing execution. Note that DOCOL, DUP, + and EXIT are all stored as +addresses which point to codewords. + +## Literals + +Literals are handled in a special way. There is a word in Forth, called LIT, +implemented in assembly. When executed, this word looks at the next Forth +instruction (i.e. the value of ESI), and places that on the stack as a literal, +and then manipulates ESI to skip over the literal value.