1 /* A sometimes minimal FORTH compiler and tutorial for Linux / i386 systems. -*- asm -*-
2 By Richard W.M. Jones <rich@annexia.org> http://annexia.org/forth
4 gcc -m32 -nostdlib -static -Wl,-Ttext,0 -o jonesforth jonesforth.S
6 INTRODUCTION ----------------------------------------------------------------------
8 FORTH is one of those alien languages which most working programmers regard in the same
9 way as Haskell, LISP, and so on. Something so strange that they'd rather any thoughts
10 of it just go away so they can get on with writing this paying code. But that's wrong
11 and if you care at all about programming then you should at least understand all these
12 languages, even if you will never use them.
14 LISP is the ultimate high-level language, and features from LISP are being added every
15 decade to the more common languages. But FORTH is in some ways the ultimate in low level
16 programming. Out of the box it lacks features like dynamic memory management and even
17 strings. In fact, at its primitive level it lacks even basic concepts like IF-statements
20 Why then would you want to learn FORTH? There are several very good reasons. First
21 and foremost, FORTH is minimal. You really can write a complete FORTH in, say, 2000
22 lines of code. I don't just mean a FORTH program, I mean a complete FORTH operating
23 system, environment and language. You could boot such a FORTH on a bare PC and it would
24 come up with a prompt where you could start doing useful work. The FORTH you have here
25 isn't minimal and uses a Linux process as its 'base PC' (both for the purposes of making
26 it a good tutorial). It's possible to completely understand the system. Who can say they
27 completely understand how Linux works, or gcc?
29 Secondly FORTH has a peculiar bootstrapping property. By that I mean that after writing
30 a little bit of assembly to talk to the hardware and implement a few primitives, all the
31 rest of the language and compiler is written in FORTH itself. Remember I said before
32 that FORTH lacked IF-statements and loops? Well of course it doesn't really because
33 such a lanuage would be useless, but my point was rather that IF-statements and loops are
34 written in FORTH itself.
36 Now of course this is common in other languages as well, and in those languages we call
37 them 'libraries'. For example in C, 'printf' is a library function written in C. But
38 in FORTH this goes way beyond mere libraries. Can you imagine writing C's 'if' in C?
39 And that brings me to my third reason: If you can write 'if' in FORTH, then why restrict
40 yourself to the usual if/while/for/switch constructs? You want a construct that iterates
41 over every other element in a list of numbers? You can add it to the language. What
42 about an operator which pulls in variables directly from a configuration file and makes
43 them available as FORTH variables? Or how about adding Makefile-like dependencies to
44 the language? No problem in FORTH. This concept isn't common in programming languages,
45 but it has a name (in fact two names): "macros" (by which I mean LISP-style macros, not
46 the lame C preprocessor) and "domain specific languages" (DSLs).
48 This tutorial isn't about learning FORTH as the language. I'll point you to some references
49 you should read if you're not familiar with using FORTH. This tutorial is about how to
50 write FORTH. In fact, until you understand how FORTH is written, you'll have only a very
51 superficial understanding of how to use it.
53 So if you're not familiar with FORTH or want to refresh your memory here are some online
56 http://en.wikipedia.org/wiki/Forth_%28programming_language%29
58 http://galileo.phys.virginia.edu/classes/551.jvn.fall01/primer.htm
60 http://wiki.laptop.org/go/Forth_Lessons
62 Here is another "Why FORTH?" essay: http://www.jwdt.com/~paysan/why-forth.html
64 SETTING UP ----------------------------------------------------------------------
66 Let's get a few housekeeping things out of the way. Firstly because I need to draw lots of
67 ASCII-art diagrams to explain concepts, the best way to look at this is using a window which
68 uses a fixed width font and is at least this wide:
70 <------------------------------------------------------------------------------------------------------------------------>
72 ASSEMBLING ----------------------------------------------------------------------
74 If you want to actually run this FORTH, rather than just read it, you will need Linux on an
75 i386. Linux because instead of programming directly to the hardware on a bare PC which I
76 could have done, I went for a simpler tutorial by assuming that the 'hardware' is a Linux
77 process with a few basic system calls (read, write and exit and that's about all). i386
78 is needed because I had to write the assembly for a processor, and i386 is by far the most
79 common. (Of course when I say 'i386', any 32- or 64-bit x86 processor will do. I'm compiling
80 this on a 64 bit AMD Opteron).
82 Again, to assemble this you will need gcc and gas (the GNU assembler). The commands to
83 assemble and run the code (save this file as 'jonesforth.S') are:
85 gcc -m32 -nostdlib -static -Wl,-Ttext,0 -o jonesforth jonesforth.S
88 You will see lots of 'Warning: unterminated string; newline inserted' messages from the
89 assembler. That's just because the GNU assembler doesn't have a good syntax for multi-line
90 strings (or rather it used to, but the developers removed it!) so I've abused the syntax
91 slightly to make things readable. Ignore these warnings.
93 ASSEMBLER ----------------------------------------------------------------------
95 (You can just skip to the next section -- you don't need to be able to read assembler to
96 follow this tutorial).
98 However if you do want to read the assembly code here are a few notes about gas (the GNU assembler):
100 (1) Register names are prefixed with '%', so %eax is the 32 bit i386 accumulator. The registers
101 available on i386 are: %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp and %esp, and most of them
102 have special purposes.
104 (2) Add, mov, etc. take arguments in the form SRC,DEST. So mov %eax,%ecx moves %eax -> %ecx
106 (3) Constants are prefixed with '$', and you mustn't forget it! If you forget it then it
107 causes a read from memory instead, so:
108 mov $2,%eax moves number 2 into %eax
109 mov 2,%eax reads the 32 bit word from address 2 into %eax (ie. most likely a mistake)
111 (4) gas has a funky syntax for local labels, where '1f' (etc.) means label '1:' "forwards"
112 and '1b' (etc.) means label '1:' "backwards".
114 (5) 'ja' is "jump if above", 'jb' for "jump if below", 'je' "jump if equal" etc.
116 (6) gas has a reasonably nice .macro syntax, and I use them a lot to make the code shorter and
119 For more help reading the assembler, do "info gas" at the Linux prompt.
121 Now the tutorial starts in earnest.
123 THE DICTIONARY ----------------------------------------------------------------------
125 In FORTH as you will know, functions are called "words", as just as in other languages they
126 have a name and a definition. Here are two FORTH words:
128 : DOUBLE 2 * ; \ name is "DOUBLE", definition is "2 *"
129 : QUADRUPLE DOUBLE DOUBLE ; \ name is "QUADRUPLE", definition is "DOUBLE DOUBLE"
131 Words, both built-in ones and ones which the programmer defines later, are stored in a dictionary
132 which is just a linked list of dictionary entries.
134 <--- DICTIONARY ENTRY (HEADER) ----------------------->
135 +------------------------+--------+---------- - - - - +----------- - - - -
136 | LINK POINTER | LENGTH/| NAME | DEFINITION
138 +--- (4 bytes) ----------+- byte -+- n bytes - - - - +----------- - - - -
140 I'll come to the definition of the word later. For now just look at the header. The first
141 4 bytes are the link pointer. This points back to the previous word in the dictionary, or, for
142 the first word in the dictionary it is just a NULL pointer. Then comes a length/flags byte.
143 The length of the word can be up to 31 characters (5 bits used) and the top three bits are used
144 for various flags which I'll come to later. This is followed by the name itself, and in this
145 implementation the name is rounded up to a multiple of 4 bytes by padding it with zero bytes.
146 That's just to ensure that the definition starts on a 32 bit boundary.
148 A FORTH variable called LATEST contains a pointer to the most recently defined word, in
149 other words, the head of this linked list.
151 DOUBLE and QUADRUPLE might look like this:
153 pointer to previous word
156 +--|------+---+---+---+---+---+---+---+---+------------- - - - -
157 | LINK | 6 | D | O | U | B | L | E | 0 | (definition ...)
158 +---------+---+---+---+---+---+---+---+---+------------- - - - -
161 +--|------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
162 | LINK | 9 | Q | U | A | D | R | U | P | L | E | 0 | 0 | (definition ...)
163 +---------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - -
169 You shoud be able to see from this how you might implement functions to find a word in
170 the dictionary (just walk along the dictionary entries starting at LATEST and matching
171 the names until you either find a match or hit the NULL pointer at the end of the dictionary),
172 and add a word to the dictionary (create a new definition, set its LINK to LATEST, and set
173 LATEST to point to the new word). We'll see precisely these functions implemented in
174 assembly code later on.
176 INDIRECT THREADED CODE ----------------------------------------------------------------------
178 Now we'll get to the really crucial bit in understanding FORTH, so go and get a cup of tea
179 or coffee and settle down. It's fair to say that if you don't understand this section, then you
180 won't "get" how FORTH works, and that would be a failure on my part for not explaining it well.
181 So if after reading this section a few times you don't understand it, please email me
201 /* Macros to deal with the return stack. */
203 lea -4(%ebp),%ebp // push reg on to return stack
208 mov (%ebp),\reg // pop top of return stack to reg
212 /* ELF entry point. */
217 mov %esp,var_S0 // Store the initial data stack pointer.
218 mov $return_stack,%ebp // Initialise the return stack.
220 mov $cold_start,%esi // Initialise interpreter.
221 NEXT // Run interpreter!
224 cold_start: // High-level code without a codeword.
227 /* DOCOL - the interpreter! */
231 PUSHRSP %esi // push %esi on to the return stack
232 addl $4,%eax // %eax points to codeword, so make
233 movl %eax,%esi // %esi point to first data word
236 /*----------------------------------------------------------------------
237 * Fixed sized buffers for everything.
241 /* FORTH return stack. */
242 #define RETURN_STACK_SIZE 8192
244 .space RETURN_STACK_SIZE
247 /* Space for user-defined words. */
248 #define USER_DEFS_SIZE 16384
251 .space USER_DEFS_SIZE
258 /*----------------------------------------------------------------------
259 * Built-in words defined the long way.
262 #define F_HIDDEN 0x20
264 // Store the chain of links.
267 .macro defcode name, namelen, flags=0, label
273 .set link,name_\label
274 .byte \flags+\namelen // flags + length byte
275 .ascii "\name" // the name
279 .int code_\label // codeword
283 code_\label : // assembler code follows
286 .macro defword name, namelen, flags=0, label
292 .set link,name_\label
293 .byte \flags+\namelen // flags + length byte
294 .ascii "\name" // the name
298 .int DOCOL // codeword - the interpreter
299 // list of word pointers follow
302 .macro defvar name, namelen, flags=0, label, initial=0
303 defcode \name,\namelen,\flags,\label
312 // Some easy ones, written in assembly for speed
313 defcode "DROP",4,,DROP
314 pop %eax // drop top of stack
318 pop %eax // duplicate top of stack
323 defcode "SWAP",4,,SWAP
324 pop %eax // swap top of stack
330 defcode "OVER",4,,OVER
331 mov 4(%esp),%eax // get the second element of stack
332 push %eax // and push it on top
344 defcode "-ROT",4,,NROT
354 incl (%esp) // increment top of stack
358 decl (%esp) // decrement top of stack
361 defcode "4+",2,,INCR4
362 addl $4,(%esp) // increment top of stack
365 defcode "4-",2,,DECR4
366 subl $4,(%esp) // decrement top of stack
383 push %eax // ignore overflow
391 push %eax // push quotient
399 push %edx // push remainder
402 defcode "=",1,,EQU // top two words are equal?
412 defcode "<>",2,,NEQU // top two words are not equal?
422 defcode "0=",2,,ZEQU // top of stack equals 0?
441 defcode "INVERT",6,,INVERT
445 // COLD must not return (ie. must not call EXIT).
446 defword "COLD",4,,COLD
447 // XXX reinitialisation of the interpreter
448 .int INTERPRETER // call the interpreter loop (never returns)
449 .int LIT,1,SYSEXIT // hmmm, but in case it does, exit(1).
451 defcode "EXIT",4,,EXIT
452 POPRSP %esi // pop return stack into %esi
456 // %esi points to the next command, but in this case it points to the next
457 // literal 32 bit integer. Get that literal into %eax and increment %esi.
458 // On x86, it's a convenient single byte instruction! (cf. NEXT macro)
460 push %eax // push the literal number on to stack
463 defcode "LITSTRING",9,,LITSTRING
464 lodsl // get the length of the string
465 push %eax // push it on the stack
466 push %esi // push the address of the start of the string
467 addl %eax,%esi // skip past the string
468 addl $3,%esi // but round up to next 4 byte boundary
472 defcode "BRANCH",6,,BRANCH
473 add (%esi),%esi // add the offset to the instruction pointer
476 defcode "0BRANCH",7,,ZBRANCH
478 test %eax,%eax // top of stack is zero?
479 jz code_BRANCH // if so, jump back to the branch function above
480 lodsl // otherwise we need to skip the offset
484 pop %ebx // address to store at
485 pop %eax // data to store there
486 mov %eax,(%ebx) // store it
490 pop %ebx // address to fetch
491 mov (%ebx),%eax // fetch it
492 push %eax // push value onto stack
495 defcode "+!",2,,ADDSTORE
497 pop %eax // the amount to add
498 addl %eax,(%ebx) // add it
501 defcode "-!",2,,SUBSTORE
503 pop %eax // the amount to subtract
504 subl %eax,(%ebx) // add it
507 /* ! and @ (STORE and FETCH) store 32-bit words. It's also useful to be able to read and write bytes.
508 * I don't know whether FORTH has these words, so I invented my own, called !b and @b.
509 * Byte-oriented operations only work on architectures which permit them (i386 is one of those).
510 * UPDATE: writing a byte to the dictionary pointer is called C, in FORTH.
512 defcode "!b",2,,STOREBYTE
513 pop %ebx // address to store at
514 pop %eax // data to store there
515 movb %al,(%ebx) // store it
518 defcode "@b",2,,FETCHBYTE
519 pop %ebx // address to fetch
521 movb (%ebx),%al // fetch it
522 push %eax // push value onto stack
525 // The STATE variable is 0 for execute mode, != 0 for compile mode
526 defvar "STATE",5,,STATE
528 // This points to where compiled words go.
529 defvar "HERE",4,,HERE,user_defs_start
531 // This is the last definition in the dictionary.
532 defvar "LATEST",6,,LATEST,name_SYSEXIT // SYSEXIT must be last in built-in dictionary
534 // _X, _Y and _Z are scratch variables used by standard words.
539 // This stores the top of the data stack.
542 // This stores the top of the return stack.
543 defvar "R0",2,,RZ,return_stack
545 defcode "DSP@",4,,DSPFETCH
550 defcode "DSP!",4,,DSPSTORE
555 pop %eax // pop parameter stack into %eax
556 PUSHRSP %eax // push it on to the return stack
559 defcode "R>",2,,FROMR
560 POPRSP %eax // pop return stack on to %eax
561 push %eax // and push on to parameter stack
564 defcode "RSP@",4,,RSPFETCH
568 defcode "RSP!",4,,RSPSTORE
572 defcode "RDROP",5,,RDROP
573 lea 4(%ebp),%ebp // pop return stack and throw away
576 #include <asm-i386/unistd.h>
580 push %eax // push return value on stack
592 1: // out of input; use read(2) to fetch more input from stdin
593 xor %ebx,%ebx // 1st param: stdin
594 mov $buffer,%ecx // 2nd param: buffer
596 mov $buffend-buffer,%edx // 3rd param: max length
597 mov $__NR_read,%eax // syscall: read
599 test %eax,%eax // If %eax <= 0, then exit.
601 addl %eax,%ecx // buffer+%eax = bufftop
605 2: // error or out of input: exit
607 mov $__NR_exit,%eax // syscall: exit
610 defcode "EMIT",4,,EMIT
615 mov $1,%ebx // 1st param: stdout
617 // write needs the address of the byte to write
619 mov $2f,%ecx // 2nd param: address
621 mov $1,%edx // 3rd param: nbytes = 1
623 mov $__NR_write,%eax // write syscall
628 2: .space 1 // scratch used by EMIT
630 defcode "WORD",4,,WORD
632 push %ecx // push length
633 push %edi // push base address
637 /* Search for first non-blank character. Also skip \ comments. */
639 call _KEY // get next key, returned in %eax
640 cmpb $'\\',%al // start of a comment?
641 je 3f // if so, skip the comment
643 jbe 1b // if so, keep looking
645 /* Search for the end of the word, storing chars as we go. */
646 mov $5f,%edi // pointer to return buffer
648 stosb // add character to return buffer
649 call _KEY // get next key, returned in %al
650 cmpb $' ',%al // is blank?
651 ja 2b // if not, keep looping
653 /* Return the word (well, the static buffer) and length. */
655 mov %edi,%ecx // return length of the word
656 mov $5f,%edi // return address of the word
659 /* Code to skip \ comments to end of the current line. */
662 cmpb $'\n',%al // end of line yet?
667 // A static buffer where WORD returns. Subsequent calls
668 // overwrite this buffer. Maximum word length is 32 chars.
671 defcode "EMITSTRING",10,,EMITSTRING
672 mov $1,%ebx // 1st param: stdout
673 pop %ecx // 2nd param: address of string
674 pop %edx // 3rd param: length of string
676 mov $__NR_write,%eax // write syscall
682 pop %eax // Get the number to print into %eax
683 call _DOT // Easier to do this recursively ...
686 mov $10,%ecx // Base 10
690 xor %edx,%edx // %edx:%eax / %ecx -> quotient %eax, remainder %edx
704 // Parse a number from a string on the stack -- almost the opposite of . (DOT)
705 // Note that there is absolutely no error checking. In particular the length of the
706 // string must be >= 1 bytes.
707 defcode "SNUMBER",7,,SNUMBER
717 imull $10,%eax // %eax *= 10
720 subb $'0',%bl // ASCII -> digit
726 defcode "FIND",4,,FIND
727 pop %edi // %edi = address
728 pop %ecx // %ecx = length
734 push %esi // Save %esi so we can use it in string comparison.
736 // Now we start searching backwards through the dictionary for this word.
737 mov var_LATEST,%edx // LATEST points to name header of the latest word in the dictionary
739 test %edx,%edx // NULL pointer? (end of the linked list)
742 // Compare the length expected and the length of the word.
743 // Note that if the F_HIDDEN flag is set on the word, then by a bit of trickery
744 // this won't pick the word (the length will appear to be wrong).
746 movb 4(%edx),%al // %al = flags+length field
747 andb $(F_HIDDEN|0x1f),%al // %al = name length
748 cmpb %cl,%al // Length is the same?
751 // Compare the strings in detail.
752 push %ecx // Save the length
753 push %edi // Save the address (repe cmpsb will move this pointer)
754 lea 5(%edx),%esi // Dictionary string we are checking against.
755 repe cmpsb // Compare the strings.
758 jne 2f // Not the same.
760 // The strings are the same - return the header pointer in %eax
766 mov (%edx),%edx // Move back through the link field to the previous word
767 jmp 1b // .. and loop.
771 xor %eax,%eax // Return zero to indicate not found.
774 defcode ">CFA",4,,TCFA // DEA -> Codeword address
781 add $4,%edi // Skip link pointer.
782 movb (%edi),%al // Load flags+len into %al.
783 inc %edi // Skip flags+len byte.
784 andb $0x1f,%al // Just the length, not the flags.
785 add %eax,%edi // Skip the name.
786 addl $3,%edi // The codeword is 4-byte aligned.
790 defcode "CHAR",4,,CHAR
791 call _WORD // Returns %ecx = length, %edi = pointer to word.
793 movb (%edi),%al // Get the first character of the word.
794 push %eax // Push it onto the stack.
799 // Get the word and create a dictionary entry header for it.
800 call _WORD // Returns %ecx = length, %edi = pointer to word.
801 mov %edi,%ebx // %ebx = address of the word
803 movl var_HERE,%edi // %edi is the address of the header
804 movl var_LATEST,%eax // Get link pointer
805 stosl // and store it in the header.
807 mov %cl,%al // Get the length.
808 orb $F_HIDDEN,%al // Set the HIDDEN flag on this entry.
809 stosb // Store the length/flags byte.
811 mov %ebx,%esi // %esi = word
812 rep movsb // Copy the word
814 addl $3,%edi // Align to next 4 byte boundary.
817 movl $DOCOL,%eax // The codeword for user-created words is always DOCOL (the interpreter)
820 // Header built, so now update LATEST and HERE.
821 // We'll be compiling words and putting them HERE.
826 // And go into compile mode by setting STATE to 1.
831 pop %eax // Code pointer to store.
835 movl var_HERE,%edi // HERE
837 movl %edi,var_HERE // Update HERE (incremented)
840 defcode "HIDDEN",6,,HIDDEN
844 movl var_LATEST,%edi // LATEST word.
845 addl $4,%edi // Point to name/flags byte.
846 xorb $F_HIDDEN,(%edi) // Toggle the HIDDEN bit.
849 defcode "IMMEDIATE",9,F_IMMED,IMMEDIATE
853 movl var_LATEST,%edi // LATEST word.
854 addl $4,%edi // Point to name/flags byte.
855 xorb $F_IMMED,(%edi) // Toggle the IMMED bit.
858 defcode ";",1,F_IMMED,SEMICOLON
859 movl $EXIT,%eax // EXIT is the final codeword in compiled words.
860 call _COMMA // Store it.
861 call _HIDDEN // Toggle the HIDDEN flag (unhides the new word).
862 xor %eax,%eax // Set STATE to 0 (back to execute mode).
866 /* This definiton of ' (TICK) is strictly cheating - it also only works in compiled code. */
868 lodsl // Get the address of the next word and skip it.
869 pushl %eax // Push it on the stack.
872 /* This interpreter is pretty simple, but remember that in FORTH you can always override
873 * it later with a more powerful one!
875 defword "INTERPRETER",11,,INTERPRETER
876 .int INTERPRET,RDROP,INTERPRETER
878 defcode "INTERPRET",9,,INTERPRET
879 call _WORD // Returns %ecx = length, %edi = pointer to word.
881 // Is it in the dictionary?
883 movl %eax,interpret_is_lit // Not a literal number (not yet anyway ...)
884 call _FIND // Returns %eax = pointer to header or 0 if not found.
885 test %eax,%eax // Found?
888 // In the dictionary. Is it an IMMEDIATE codeword?
889 mov %eax,%edi // %edi = dictionary entry
890 movb 4(%edi),%al // Get name+flags.
891 push %ax // Just save it for now.
892 call _TCFA // Convert dictionary entry (in %edi) to codeword pointer.
894 andb $F_IMMED,%al // Is IMMED flag set?
896 jnz 4f // If IMMED, jump straight to executing.
900 1: // Not in the dictionary (not a word) so assume it's a literal number.
901 incl interpret_is_lit
902 call _SNUMBER // Returns the parsed number in %eax
904 mov $LIT,%eax // The word is LIT
906 2: // Are we compiling or executing?
909 jz 4f // Jump if executing.
911 // Compiling - just append the word to the current dictionary definition.
913 mov interpret_is_lit,%ecx // Was it a literal?
916 mov %ebx,%eax // Yes, so LIT is followed by a number.
920 4: // Executing - run it!
921 mov interpret_is_lit,%ecx // Literal?
922 test %ecx,%ecx // Literal?
925 // Not a literal, execute it now. This never returns, but the codeword will
926 // eventually call NEXT which will reenter the loop in INTERPRETER.
929 5: // Executing a literal, which means push it on the stack.
936 .int 0 // Flag used to record if reading a literal
938 // NB: SYSEXIT must be the last entry in the built-in dictionary.
939 defcode SYSEXIT,7,,SYSEXIT
944 /*----------------------------------------------------------------------
945 * Input buffer & initial input.
950 // XXX gives 'Warning: unterminated string; newline inserted' messages which you can ignore
952 \\ Define some character constants
958 \\ CR prints a carriage return
961 \\ SPACE prints a space
962 : SPACE 'SPACE' EMIT ;
964 \\ Primitive . (DOT) function doesn't follow with a blank, so redefine it to behave like FORTH.
965 \\ Notice how we can trivially redefine existing functions.
968 \\ DUP, DROP are defined in assembly for speed, but this is how you might define them
969 \\ in FORTH. Notice use of the scratch variables _X and _Y.
970 \\ : DUP _X ! _X @ _X @ ;
973 \\ The 2... versions of the standard operators work on pairs of stack entries. They're not used
974 \\ very commonly so not really worth writing in assembler. Here is how they are defined in FORTH.
978 \\ More standard FORTH words.
982 \\ [ and ] allow you to break into immediate mode while compiling a word.
983 : [ IMMEDIATE \\ define [ as an immediate word
984 0 STATE ! \\ go into immediate mode
988 1 STATE ! \\ go back to compile mode
991 \\ LITERAL takes whatever is on the stack and compiles LIT <foo>
993 ' LIT , \\ compile LIT
994 , \\ compile the literal itself (from the stack)
997 \\ condition IF true-part THEN rest
999 \\ condition 0BRANCH OFFSET true-part rest
1000 \\ where OFFSET is the offset of 'rest'
1001 \\ condition IF true-part ELSE false-part THEN
1003 \\ condition 0BRANCH OFFSET true-part BRANCH OFFSET2 false-part rest
1004 \\ where OFFSET if the offset of false-part and OFFSET2 is the offset of rest
1006 \\ IF is an IMMEDIATE word which compiles 0BRANCH followed by a dummy offset, and places
1007 \\ the address of the 0BRANCH on the stack. Later when we see THEN, we pop that address
1008 \\ off the stack, calculate the offset, and back-fill the offset.
1010 ' 0BRANCH , \\ compile 0BRANCH
1011 HERE @ \\ save location of the offset on the stack
1012 0 , \\ compile a dummy offset
1017 HERE @ SWAP - \\ calculate the offset from the address saved on the stack
1018 SWAP ! \\ store the offset in the back-filled location
1022 ' BRANCH , \\ definite branch to just over the false-part
1023 HERE @ \\ save location of the offset on the stack
1024 0 , \\ compile a dummy offset
1025 SWAP \\ now back-fill the original (IF) offset
1026 DUP \\ same as for THEN word above
1031 \\ BEGIN loop-part condition UNTIL
1033 \\ loop-part condition 0BRANCH OFFSET
1034 \\ where OFFSET points back to the loop-part
1035 \\ This is like do { loop-part } while (condition) in the C language
1037 HERE @ \\ save location on the stack
1041 ' 0BRANCH , \\ compile 0BRANCH
1042 HERE @ - \\ calculate the offset from the address saved on the stack
1043 , \\ compile the offset here
1046 \\ BEGIN loop-part AGAIN
1048 \\ loop-part BRANCH OFFSET
1049 \\ where OFFSET points back to the loop-part
1050 \\ In other words, an infinite loop which can only be returned from with EXIT
1052 ' BRANCH , \\ compile BRANCH
1053 HERE @ - \\ calculate the offset back
1054 , \\ compile the offset here
1057 \\ BEGIN condition WHILE loop-part REPEAT
1059 \\ condition 0BRANCH OFFSET2 loop-part BRANCH OFFSET
1060 \\ where OFFSET points back to condition (the beginning) and OFFSET2 points to after the whole piece of code
1061 \\ So this is like a while (condition) { loop-part } loop in the C language
1063 ' 0BRANCH , \\ compile 0BRANCH
1064 HERE @ \\ save location of the offset2 on the stack
1065 0 , \\ compile a dummy offset2
1069 ' BRANCH , \\ compile BRANCH
1070 SWAP \\ get the original offset (from BEGIN)
1071 HERE @ - , \\ and compile it after BRANCH
1073 HERE @ SWAP - \\ calculate the offset2
1074 SWAP ! \\ and back-fill it in the original location
1077 \\ With the looping constructs, we can now write SPACES, which writes n spaces to stdout.
1080 SPACE \\ print a space
1081 1- \\ until we count down to 0
1086 \\ .S prints the contents of the stack. Very useful for debugging.
1088 DSP@ \\ get current stack pointer
1090 DUP @ . \\ print the stack element
1092 DUP S0 @ 4- = \\ stop when we get to the top
1097 \\ DEPTH returns the depth of the stack.
1098 : DEPTH S0 @ DSP@ - ;
1100 \\ .\" is the print string operator in FORTH. Example: .\" Something to print\"
1101 \\ The space after the operator is the ordinary space required between words.
1102 \\ This is tricky to define because it has to do different things depending on whether
1103 \\ we are compiling or in immediate mode. (Thus the word is marked IMMEDIATE so it can
1104 \\ detect this and do different things).
1105 \\ In immediate mode we just keep reading characters and printing them until we get to
1106 \\ the next double quote.
1107 \\ In compile mode we have the problem of where we're going to store the string (remember
1108 \\ that the input buffer where the string comes from may be overwritten by the time we
1109 \\ come round to running the function). We store the string in the compiled function
1111 \\ LITSTRING, string length, string rounded up to 4 bytes, EMITSTRING, ...
1113 STATE @ \\ compiling?
1115 ' LITSTRING , \\ compile LITSTRING
1116 HERE @ \\ save the address of the length word on the stack
1117 0 , \\ dummy length - we don't know what it is yet
1119 KEY \\ get next character of the string
1122 HERE @ !b \\ store the character in the compiled image
1123 1 HERE +! \\ increment HERE pointer by 1 byte
1125 DROP \\ drop the double quote character at the end
1126 DUP \\ get the saved address of the length word
1127 HERE @ SWAP - \\ calculate the length
1128 4- \\ subtract 4 (because we measured from the start of the length word)
1129 SWAP ! \\ and back-fill the length location
1130 HERE @ \\ round up to next multiple of 4 bytes for the remaining code
1134 ' EMITSTRING , \\ compile the final EMITSTRING
1136 \\ In immediate mode, just read characters and print them until we get
1137 \\ to the ending double quote. Much simpler than the above code!
1140 DUP '\"' = IF EXIT THEN
1146 \\ While compiling, [COMPILE] WORD compiles WORD if it would otherwise be IMMEDIATE.
1147 : [COMPILE] IMMEDIATE
1148 WORD \\ get the next word
1149 FIND \\ find it in the dictionary
1150 >CFA \\ get its codeword
1151 , \\ and compile that
1154 \\ RECURSE makes a recursive call to the current word that is being compiled.
1155 \\ Normally while a word is being compiled, it is marked HIDDEN so that references to the
1156 \\ same word within are calls to the previous definition of the word.
1158 LATEST @ >CFA \\ LATEST points to the word being compiled at the moment
1162 \\ ALLOT is used to allocate (static) memory when compiling. It increases HERE by
1163 \\ the amount given on the stack.
1167 \\ Finally print the welcome prompt.