X-Git-Url: https://git.rrq.au/?a=blobdiff_plain;ds=sidebyside;f=jonesforth.f;h=025d9b0ca27b43cd2d94c4bc77db71befcfcc938;hb=83c6612bf07228f70d25136c850f660721379632;hp=b05b64cba2d0b2012b8f07aca2f5011285e8e8ed;hpb=a2e824fd9d9b81d143db5e670bf8fcad4a78bb4d;p=rrq%2Fjonesforth.git diff --git a/jonesforth.f b/jonesforth.f index b05b64c..025d9b0 100644 --- a/jonesforth.f +++ b/jonesforth.f @@ -2,7 +2,7 @@ \ A sometimes minimal FORTH compiler and tutorial for Linux / i386 systems. -*- asm -*- \ By Richard W.M. Jones http://annexia.org/forth \ This is PUBLIC DOMAIN (see public domain release statement below). -\ $Id: jonesforth.f,v 1.5 2007-09-26 22:20:52 rich Exp $ +\ $Id: jonesforth.f,v 1.13 2007-10-07 11:07:15 rich Exp $ \ \ The first part of this tutorial is in jonesforth.S. Get if from http://annexia.org/forth \ @@ -51,19 +51,14 @@ : MOD /MOD DROP ; \ Define some character constants -: '\n' 10 ; -: 'SPACE' 32 ; +: '\n' 10 ; +: BL 32 ; \ BL (BLank) is a standard FORTH word for space. \ CR prints a carriage return : CR '\n' EMIT ; \ SPACE prints a space -: SPACE 'SPACE' EMIT ; - -\ DUP, DROP are defined in assembly for speed, but this is how you might define them -\ in FORTH. Notice use of the scratch variables _X and _Y. -\ : DUP _X ! _X @ _X @ ; -\ : DROP _X ! ; +: SPACE BL EMIT ; \ The 2... versions of the standard operators work on pairs of stack entries. They're not used \ very commonly so not really worth writing in assembler. Here is how they are defined in FORTH. @@ -100,6 +95,7 @@ ; \ A few more character constants defined the same way as above. +: ';' [ CHAR ; ] LITERAL ; : '(' [ CHAR ( ] LITERAL ; : ')' [ CHAR ) ] LITERAL ; : '"' [ CHAR " ] LITERAL ; @@ -128,6 +124,8 @@ , \ compile it ; +\ CONTROL STRUCTURES ---------------------------------------------------------------------- +\ \ So far we have defined only very simple definitions. Before we can go further, we really need to \ make some control structures, like IF ... THEN and loops. Luckily we can define arbitrary control \ structures directly in FORTH. @@ -211,6 +209,21 @@ SWAP ! \ and back-fill it in the original location ; +\ UNLESS is the same as IF but the test is reversed. +\ +\ Note the use of [COMPILE]: Since IF is IMMEDIATE we don't want it to be executed while UNLESS +\ is compiling, but while UNLESS is running (which happens to be when whatever word using UNLESS is +\ being compiled -- whew!). So we use [COMPILE] to reverse the effect of marking IF as immediate. +\ This trick is generally used when we want to write our own control words without having to +\ implement them all in terms of the primitives 0BRANCH and BRANCH, but instead reusing simpler +\ control words like (in this instance) IF. +: UNLESS IMMEDIATE + ' NOT , \ compile NOT (to reverse the test) + [COMPILE] IF \ continue by calling the normal IF +; + +\ COMMENTS ---------------------------------------------------------------------- +\ \ FORTH allows ( ... ) as comments within function definitions. This works by having an IMMEDIATE \ word called ( which just drops input characters until it hits the corresponding ). : ( IMMEDIATE @@ -232,6 +245,8 @@ ( From now on we can use ( ... ) for comments. + STACK NOTATION ---------------------------------------------------------------------- + In FORTH style we can also use ( ... -- ... ) to show the effects that a word has on the parameter stack. For example: @@ -241,6 +256,16 @@ ( -- ) means the word has no effect on the stack ) +( Some more complicated stack examples, showing the stack notation. ) +: NIP ( x y -- y ) SWAP DROP ; +: TUCK ( x y -- y x y ) DUP ROT ; +: PICK ( x_u ... x_1 x_0 u -- x_u ... x_1 x_0 x_u ) + 1+ ( add one because of 'u' on the stack ) + 4 * ( multiply by the word size ) + DSP@ + ( add to the stack pointer ) + @ ( and fetch ) +; + ( With the looping constructs, we can now write SPACES, which writes n spaces to stdout. ) : SPACES ( n -- ) BEGIN @@ -257,6 +282,8 @@ : HEX ( -- ) 16 BASE ! ; ( + PRINTING NUMBERS ---------------------------------------------------------------------- + The standard FORTH word . (DOT) is very important. It takes the number at the top of the stack and prints it out. However first I'm going to implement some lower-level FORTH words: @@ -286,10 +313,8 @@ ( This is the underlying recursive definition of U. ) : U. ( u -- ) BASE @ /MOD ( width rem quot ) - DUP 0<> IF ( if quotient <> 0 then ) + ?DUP IF ( if quotient <> 0 then ) RECURSE ( print the quotient ) - ELSE - DROP ( drop the zero quotient ) THEN ( print the remainder ) @@ -322,10 +347,9 @@ ( This word returns the width (in characters) of an unsigned number in the current base ) : UWIDTH ( u -- width ) BASE @ / ( rem quot ) - DUP 0<> IF ( if quotient <> 0 then ) + ?DUP IF ( if quotient <> 0 then ) RECURSE 1+ ( return 1+recursive call ) ELSE - DROP ( drop the zero quotient ) 1 ( return 1 ) THEN ; @@ -384,7 +408,7 @@ : U. U. SPACE ; ( ? fetches the integer at an address and prints it. ) -: ? @ . ; +: ? ( addr -- ) @ . ; ( c a b WITHIN returns true if a <= c and c < b ) : WITHIN @@ -421,8 +445,10 @@ : ALIGN HERE @ ALIGNED HERE ! ; ( + STRINGS ---------------------------------------------------------------------- + S" string" is used in FORTH to define strings. It leaves the address of the string and - its length on the stack, with the address at the top. The space following S" is the normal + its length on the stack, (length at the top of stack). The space following S" is the normal space between FORTH words and is not a part of the string. This is tricky to define because it has to do different things depending on whether @@ -438,7 +464,13 @@ case we put the string at HERE (but we _don't_ change HERE). This is meant as a temporary location, likely to be overwritten soon after. ) -: S" IMMEDIATE ( -- len addr ) +( C, appends a byte to the current compiled word. ) +: C, + HERE @ C! ( store the character in the compiled image ) + 1 HERE +! ( increment HERE pointer by 1 byte ) +; + +: S" IMMEDIATE ( -- addr len ) STATE @ IF ( compiling? ) ' LITSTRING , ( compile LITSTRING ) HERE @ ( save the address of the length word on the stack ) @@ -447,8 +479,7 @@ KEY ( get next character of the string ) DUP '"' <> WHILE - HERE @ C! ( store the character in the compiled image ) - 1 HERE +! ( increment HERE pointer by 1 byte ) + C, ( copy character ) REPEAT DROP ( drop the double quote character at the end ) DUP ( get the saved address of the length word ) @@ -468,6 +499,7 @@ DROP ( drop the final " character ) HERE @ - ( calculate the length ) HERE @ ( push the start address ) + SWAP ( addr len ) THEN ; @@ -479,8 +511,8 @@ In immediate mode we just keep reading characters and printing them until we get to the next double quote. - In compile mode we use S" to store the string, then add EMITSTRING afterwards: - LITSTRING EMITSTRING + In compile mode we use S" to store the string, then add TELL afterwards: + LITSTRING TELL It may be interesting to note the use of [COMPILE] to turn the call to the immediate word S" into compilation of that word. It compiles it into the definition of .", @@ -490,7 +522,7 @@ : ." IMMEDIATE ( -- ) STATE @ IF ( compiling? ) [COMPILE] S" ( read the string, and compile LITSTRING, etc. ) - ' EMITSTRING , ( compile the final EMITSTRING ) + ' TELL , ( compile the final TELL ) ELSE ( In immediate mode, just read characters and print them until we get to the ending double quote. ) @@ -506,6 +538,8 @@ ; ( + CONSTANTS AND VARIABLES ---------------------------------------------------------------------- + In FORTH, global constants and variables are defined like this: 10 CONSTANT TEN when TEN is executed, it leaves the integer 10 on the stack @@ -533,7 +567,8 @@ The trick is to define a new word for the variable itself (eg. if the variable was called 'VAR' then we would define a new word called VAR). This is easy to do because we exposed dictionary entry creation through the CREATE word (part of the definition of : above). - A call to CREATE TEN leaves the dictionary entry: + A call to WORD [TEN] CREATE (where [TEN] means that "TEN" is the next word in the input) + leaves the dictionary entry: +--- HERE | @@ -559,7 +594,8 @@ assembler part which returns the value of the assembler symbol of the same name. ) : CONSTANT - CREATE ( make the dictionary entry (the name follows CONSTANT) ) + WORD ( get the name (the name follows CONSTANT) ) + CREATE ( make the dictionary entry ) DOCOL , ( append DOCOL (the codeword field of this word) ) ' LIT , ( append the codeword LIT ) , ( append the value on the top of the stack ) @@ -568,10 +604,10 @@ ( VARIABLE is a little bit harder because we need somewhere to put the variable. There is - nothing particularly special about the 'user definitions area' (the area of memory pointed - to by HERE where we have previously just stored new word definitions). We can slice off - bits of this memory area to store anything we want, so one possible definition of - VARIABLE might create this: + nothing particularly special about the user memory (the area of memory pointed to by HERE + where we have previously just stored new word definitions). We can slice off bits of this + memory area to store anything we want, so one possible definition of VARIABLE might create + this: +--------------------------------------------------------------+ | | @@ -584,7 +620,7 @@ where is the place to store the variable, and points back to it. To make this more general let's define a couple of words which we can use to allocate - arbitrary memory from the user definitions area. + arbitrary memory from the user memory. First ALLOT, where n ALLOT allocates n bytes of memory. (Note when calling this that it's a very good idea to make sure that n is a multiple of 4, or at least that next time @@ -608,7 +644,7 @@ ) : VARIABLE 1 CELLS ALLOT ( allocate 1 cell of memory, push the pointer to this memory ) - CREATE ( make the dictionary entry (the name follows VARIABLE) ) + WORD CREATE ( make the dictionary entry (the name follows VARIABLE) ) DOCOL , ( append DOCOL (the codeword field of this word) ) ' LIT , ( append the codeword LIT ) , ( append the pointer to the new memory ) @@ -616,6 +652,8 @@ ; ( + VALUES ---------------------------------------------------------------------- + VALUEs are like VARIABLEs but with a simpler syntax. You would generally use them when you want a variable which is read often, and written infrequently. @@ -664,7 +702,7 @@ way cannot be inlined). ) : VALUE ( n -- ) - CREATE ( make the dictionary entry (the name follows VALUE) ) + WORD CREATE ( make the dictionary entry (the name follows VALUE) ) DOCOL , ( append DOCOL ) ' LIT , ( append the codeword LIT ) , ( append the initial value ) @@ -701,6 +739,8 @@ ; ( + PRINTING THE DICTIONARY ---------------------------------------------------------------------- + ID. takes an address of a dictionary entry and prints the word's name. For example: LATEST @ ID. would print the name of the last word that was defined. @@ -746,19 +786,20 @@ : WORDS LATEST @ ( start at LATEST dictionary entry ) BEGIN - DUP 0<> ( while link pointer is not null ) + ?DUP ( while link pointer is not null ) WHILE DUP ?HIDDEN NOT IF ( ignore hidden words ) DUP ID. ( but if not hidden, print the word ) + SPACE THEN - SPACE @ ( dereference the link pointer - go to previous word ) REPEAT - DROP CR ; ( + FORGET ---------------------------------------------------------------------- + So far we have only allocated words and memory. FORTH provides a rather primitive method to deallocate. @@ -783,7 +824,12 @@ ; ( + DUMP ---------------------------------------------------------------------- + DUMP is used to dump out the contents of memory, in the 'traditional' hexdump format. + + Notice that the parameters to DUMP (address, length) are compatible with string words + such as WORD and S". ) : DUMP ( addr len -- ) BASE @ ROT ( save the current BASE at the bottom of the stack ) @@ -792,7 +838,7 @@ BEGIN DUP 0> ( while len > 0 ) WHILE - OVER 8 .R ( print the address ) + OVER 8 U.R ( print the address ) SPACE ( print up to 16 words on this line ) @@ -838,6 +884,772 @@ BASE ! ( restore saved BASE ) ; -( Finally print the welcome prompt. ) -." JONESFORTH VERSION " VERSION . CR -." OK " +( + CASE ---------------------------------------------------------------------- + + CASE...ENDCASE is how we do switch statements in FORTH. There is no generally + agreed syntax for this, so I've gone for the syntax mandated by the ISO standard + FORTH (ANS-FORTH). + + ( some value on the stack ) + CASE + test1 OF ... ENDOF + test2 OF ... ENDOF + testn OF ... ENDOF + ... ( default case ) + ENDCASE + + The CASE statement tests the value on the stack by comparing it for equality with + test1, test2, ..., testn and executes the matching piece of code within OF ... ENDOF. + If none of the test values match then the default case is executed. Inside the ... of + the default case, the value is still at the top of stack (it is implicitly DROP-ed + by ENDCASE). When ENDOF is executed it jumps after ENDCASE (ie. there is no "fall-through" + and no need for a break statement like in C). + + The default case may be omitted. In fact the tests may also be omitted so that you + just have a default case, although this is probably not very useful. + + An example (assuming that 'q', etc. are words which push the ASCII value of the letter + on the stack): + + 0 VALUE QUIT + 0 VALUE SLEEP + KEY CASE + 'q' OF 1 TO QUIT ENDOF + 's' OF 1 TO SLEEP ENDOF + ( default case: ) + ." Sorry, I didn't understand key <" DUP EMIT ." >, try again." CR + ENDCASE + + (In some versions of FORTH, more advanced tests are supported, such as ranges, etc. + Other versions of FORTH need you to write OTHERWISE to indicate the default case. + As I said above, this FORTH tries to follow the ANS FORTH standard). + + The implementation of CASE...ENDCASE is somewhat non-trivial. I'm following the + implementations from here: + http://www.uni-giessen.de/faq/archiv/forthfaq.case_endcase/msg00000.html + + The general plan is to compile the code as a series of IF statements: + + CASE (push 0 on the immediate-mode parameter stack) + test1 OF ... ENDOF test1 OVER = IF DROP ... ELSE + test2 OF ... ENDOF test2 OVER = IF DROP ... ELSE + testn OF ... ENDOF testn OVER = IF DROP ... ELSE + ... ( default case ) ... + ENDCASE DROP THEN [THEN [THEN ...]] + + The CASE statement pushes 0 on the immediate-mode parameter stack, and that number + is used to count how many THEN statements we need when we get to ENDCASE so that each + IF has a matching THEN. The counting is done implicitly. If you recall from the + implementation above of IF, each IF pushes a code address on the immediate-mode stack, + and these addresses are non-zero, so by the time we get to ENDCASE the stack contains + some number of non-zeroes, followed by a zero. The number of non-zeroes is how many + times IF has been called, so how many times we need to match it with THEN. + + This code uses [COMPILE] so that we compile calls to IF, ELSE, THEN instead of + actually calling them while we're compiling the words below. + + As is the case with all of our control structures, they only work within word + definitions, not in immediate mode. +) +: CASE IMMEDIATE + 0 ( push 0 to mark the bottom of the stack ) +; + +: OF IMMEDIATE + ' OVER , ( compile OVER ) + ' = , ( compile = ) + [COMPILE] IF ( compile IF ) + ' DROP , ( compile DROP ) +; + +: ENDOF IMMEDIATE + [COMPILE] ELSE ( ENDOF is the same as ELSE ) +; + +: ENDCASE IMMEDIATE + ' DROP , ( compile DROP ) + + ( keep compiling THEN until we get to our zero marker ) + BEGIN + ?DUP + WHILE + [COMPILE] THEN + REPEAT +; + +( + DECOMPILER ---------------------------------------------------------------------- + + CFA> is the opposite of >CFA. It takes a codeword and tries to find the matching + dictionary definition. (In truth, it works with any pointer into a word, not just + the codeword pointer, and this is needed to do stack traces). + + In this FORTH this is not so easy. In fact we have to search through the dictionary + because we don't have a convenient back-pointer (as is often the case in other versions + of FORTH). Because of this search, CFA> should not be used when performance is critical, + so it is only used for debugging tools such as the decompiler and printing stack + traces. + + This word returns 0 if it doesn't find a match. +) +: CFA> + LATEST @ ( start at LATEST dictionary entry ) + BEGIN + ?DUP ( while link pointer is not null ) + WHILE + 2DUP SWAP ( cfa curr curr cfa ) + < IF ( current dictionary entry < cfa? ) + NIP ( leave curr dictionary entry on the stack ) + EXIT + THEN + @ ( follow link pointer back ) + REPEAT + DROP ( restore stack ) + 0 ( sorry, nothing found ) +; + +( + SEE decompiles a FORTH word. + + We search for the dictionary entry of the word, then search again for the next + word (effectively, the end of the compiled word). This results in two pointers: + + +---------+---+---+---+---+------------+------------+------------+------------+ + | LINK | 3 | T | E | N | DOCOL | LIT | 10 | EXIT | + +---------+---+---+---+---+------------+------------+------------+------------+ + ^ ^ + | | + Start of word End of word + + With this information we can have a go at decompiling the word. We need to + recognise "meta-words" like LIT, LITSTRING, BRANCH, etc. and treat those separately. +) +: SEE + WORD FIND ( find the dictionary entry to decompile ) + + ( Now we search again, looking for the next word in the dictionary. This gives us + the length of the word that we will be decompiling. (Well, mostly it does). ) + HERE @ ( address of the end of the last compiled word ) + LATEST @ ( word last curr ) + BEGIN + 2 PICK ( word last curr word ) + OVER ( word last curr word curr ) + <> ( word last curr word<>curr? ) + WHILE ( word last curr ) + NIP ( word curr ) + DUP @ ( word curr prev (which becomes: word last curr) ) + REPEAT + + DROP ( at this point, the stack is: start-of-word end-of-word ) + SWAP ( end-of-word start-of-word ) + + ( begin the definition with : NAME [IMMEDIATE] ) + ':' EMIT SPACE DUP ID. SPACE + DUP ?IMMEDIATE IF ." IMMEDIATE " THEN + + >DFA ( get the data address, ie. points after DOCOL | end-of-word start-of-data ) + + ( now we start decompiling until we hit the end of the word ) + BEGIN ( end start ) + 2DUP > + WHILE + DUP @ ( end start codeword ) + + CASE + ' LIT OF ( is it LIT ? ) + 4 + DUP @ ( get next word which is the integer constant ) + . ( and print it ) + ENDOF + ' LITSTRING OF ( is it LITSTRING ? ) + [ CHAR S ] LITERAL EMIT '"' EMIT SPACE ( print S" ) + 4 + DUP @ ( get the length word ) + SWAP 4 + SWAP ( end start+4 length ) + 2DUP TELL ( print the string ) + '"' EMIT SPACE ( finish the string with a final quote ) + + ALIGNED ( end start+4+len, aligned ) + 4 - ( because we're about to add 4 below ) + ENDOF + ' 0BRANCH OF ( is it 0BRANCH ? ) + ." 0BRANCH ( " + 4 + DUP @ ( print the offset ) + . + ." ) " + ENDOF + ' BRANCH OF ( is it BRANCH ? ) + ." BRANCH ( " + 4 + DUP @ ( print the offset ) + . + ." ) " + ENDOF + ' ' OF ( is it ' (TICK) ? ) + [ CHAR ' ] LITERAL EMIT SPACE + 4 + DUP @ ( get the next codeword ) + CFA> ( and force it to be printed as a dictionary entry ) + ID. SPACE + ENDOF + ' EXIT OF ( is it EXIT? ) + ( We expect the last word to be EXIT, and if it is then we don't print it + because EXIT is normally implied by ;. EXIT can also appear in the middle + of words, and then it needs to be printed. ) + 2DUP ( end start end start ) + 4 + ( end start end start+4 ) + <> IF ( end start | we're not at the end ) + ." EXIT " + THEN + ENDOF + ( default case: ) + DUP ( in the default case we always need to DUP before using ) + CFA> ( look up the codeword to get the dictionary entry ) + ID. SPACE ( and print it ) + ENDCASE + + 4 + ( end start+4 ) + REPEAT + + ';' EMIT CR + + 2DROP ( restore stack ) +; + +( + EXECUTION TOKENS ---------------------------------------------------------------------- + + Standard FORTH defines a concept called an 'execution token' (or 'xt') which is very + similar to a function pointer in C. We map the execution token to a codeword address. + + execution token of DOUBLE is the address of this codeword + | + V + +---------+---+---+---+---+---+---+---+---+------------+------------+------------+------------+ + | LINK | 6 | D | O | U | B | L | E | 0 | DOCOL | DUP | + | EXIT | + +---------+---+---+---+---+---+---+---+---+------------+------------+------------+------------+ + len pad codeword ^ + + There is one assembler primitive for execution tokens, EXECUTE ( xt -- ), which runs them. + + You can make an execution token for an existing word the long way using >CFA, + ie: WORD [foo] FIND >CFA will push the xt for foo onto the stack where foo is the + next word in input. So a very slow way to run DOUBLE might be: + + : DOUBLE DUP + ; + : SLOW WORD FIND >CFA EXECUTE ; + 5 SLOW DOUBLE . CR \ prints 10 + + We also offer a simpler and faster way to get the execution token of any word FOO: + + ['] FOO + + (Exercises for readers: (1) What is the difference between ['] FOO and ' FOO? + (2) What is the relationship between ', ['] and LIT?) + + More useful is to define anonymous words and/or to assign xt's to variables. + + To define an anonymous word (and push its xt on the stack) use :NONAME ... ; as in this + example: + + :NONAME ." anon word was called" CR ; \ pushes xt on the stack + DUP EXECUTE EXECUTE \ executes the anon word twice + + Stack parameters work as expected: + + :NONAME ." called with parameter " . CR ; + DUP + 10 SWAP EXECUTE \ prints 'called with parameter 10' + 20 SWAP EXECUTE \ prints 'called with parameter 20' + + Notice that the above code has a memory leak: the anonymous word is still compiled + into the data segment, so even if you lose track of the xt, the word continues to + occupy memory. A good way to keep track of the xt and thus avoid the memory leak is + to assign it to a CONSTANT, VARIABLE or VALUE: + + 0 VALUE ANON + :NONAME ." anon word was called" CR ; TO ANON + ANON EXECUTE + ANON EXECUTE + + Another use of :NONAME is to create an array of functions which can be called quickly + (think: fast switch statement). This example is adapted from the ANS FORTH standard: + + 10 CELLS ALLOT CONSTANT CMD-TABLE + : SET-CMD CELLS CMD-TABLE + ! ; + : CALL-CMD CELLS CMD-TABLE + @ EXECUTE ; + + :NONAME ." alternate 0 was called" CR ; 0 SET-CMD + :NONAME ." alternate 1 was called" CR ; 1 SET-CMD + \ etc... + :NONAME ." alternate 9 was called" CR ; 9 SET-CMD + + 0 CALL-CMD + 1 CALL-CMD +) + +: :NONAME + 0 0 CREATE ( create a word with no name - we need a dictionary header because ; expects it ) + HERE @ ( current HERE value is the address of the codeword, ie. the xt ) + DOCOL , ( compile DOCOL (the codeword) ) + ] ( go into compile mode ) +; + +: ['] IMMEDIATE + ' LIT , ( compile LIT ) +; + +( + EXCEPTIONS ---------------------------------------------------------------------- + + Amazingly enough, exceptions can be implemented directly in FORTH, in fact rather easily. + + The general usage is as follows: + + : FOO ( n -- ) THROW ; + + : TEST-EXCEPTIONS + 25 ['] FOO CATCH \ execute 25 FOO, catching any exception + ?DUP IF + ." called FOO and it threw exception number: " + . CR + DROP \ we have to drop the argument of FOO (25) + THEN + ; + \ prints: called FOO and it threw exception number: 25 + + CATCH runs an execution token and detects whether it throws any exception or not. The + stack signature of CATCH is rather complicated: + + ( a_n-1 ... a_1 a_0 xt -- r_m-1 ... r_1 r_0 0 ) if xt did NOT throw an exception + ( a_n-1 ... a_1 a_0 xt -- ?_n-1 ... ?_1 ?_0 e ) if xt DID throw exception 'e' + + where a_i and r_i are the (arbitrary number of) argument and return stack contents + before and after xt is EXECUTEd. Notice in particular the case where an exception + is thrown, the stack pointer is restored so that there are n of _something_ on the + stack in the positions where the arguments a_i used to be. We don't really guarantee + what is on the stack -- perhaps the original arguments, and perhaps other nonsense -- + it largely depends on the implementation of the word that was executed. + + THROW, ABORT and a few others throw exceptions. + + Exception numbers are non-zero integers. By convention the positive numbers can be used + for app-specific exceptions and the negative numbers have certain meanings defined in + the ANS FORTH standard. (For example, -1 is the exception thrown by ABORT). + + 0 THROW does nothing. This is the stack signature of THROW: + + ( 0 -- ) + ( * e -- ?_n-1 ... ?_1 ?_0 e ) the stack is restored to the state from the corresponding CATCH + + The implementation hangs on the definitions of CATCH and THROW and the state shared + between them. + + Up to this point, the return stack has consisted merely of a list of return addresses, + with the top of the return stack being the return address where we will resume executing + when the current word EXITs. However CATCH will push a more complicated 'exception stack + frame' on the return stack. The exception stack frame records some things about the + state of execution at the time that CATCH was called. + + When called, THROW walks up the return stack (the process is called 'unwinding') until + it finds the exception stack frame. It then uses the data in the exception stack frame + to restore the state allowing execution to continue after the matching CATCH. (If it + unwinds the stack and doesn't find the exception stack frame then it prints a message + and drops back to the prompt, which is also normal behaviour for so-called 'uncaught + exceptions'). + + This is what the exception stack frame looks like. (As is conventional, the return stack + is shown growing downwards from higher to lower memory addresses). + + +------------------------------+ + | return address from CATCH | Notice this is already on the + | | return stack when CATCH is called. + +------------------------------+ + | original parameter stack | + | pointer | + +------------------------------+ ^ + | exception stack marker | | + | (EXCEPTION-MARKER) | | Direction of stack + +------------------------------+ | unwinding by THROW. + | + | + + The EXCEPTION-MARKER marks the entry as being an exception stack frame rather than an + ordinary return address, and it is this which THROW "notices" as it is unwinding the + stack. (If you want to implement more advanced exceptions such as TRY...WITH then + you'll need to use a different value of marker if you want the old and new exception stack + frame layouts to coexist). + + What happens if the executed word doesn't throw an exception? It will eventually + return and call EXCEPTION-MARKER, so EXCEPTION-MARKER had better do something sensible + without us needing to modify EXIT. This nicely gives us a suitable definition of + EXCEPTION-MARKER, namely a function that just drops the stack frame and itself + returns (thus "returning" from the original CATCH). + + One thing to take from this is that exceptions are a relatively lightweight mechanism + in FORTH. +) + +: EXCEPTION-MARKER + RDROP ( drop the original parameter stack pointer ) + 0 ( there was no exception, this is the normal return path ) +; + +: CATCH ( xt -- exn? ) + DSP@ 4+ >R ( save parameter stack pointer (+4 because of xt) on the return stack ) + ' EXCEPTION-MARKER 4+ ( push the address of the RDROP inside EXCEPTION-MARKER ... ) + >R ( ... on to the return stack so it acts like a return address ) + EXECUTE ( execute the nested function ) +; + +: THROW ( n -- ) + ?DUP IF ( only act if the exception code <> 0 ) + RSP@ ( get return stack pointer ) + BEGIN + DUP R0 4- < ( RSP < R0 ) + WHILE + DUP @ ( get the return stack entry ) + ' EXCEPTION-MARKER 4+ = IF ( found the EXCEPTION-MARKER on the return stack ) + 4+ ( skip the EXCEPTION-MARKER on the return stack ) + RSP! ( restore the return stack pointer ) + + ( Restore the parameter stack. ) + DUP DUP DUP ( reserve some working space so the stack for this word + doesn't coincide with the part of the stack being restored ) + R> ( get the saved parameter stack pointer | n dsp ) + 4- ( reserve space on the stack to store n ) + SWAP OVER ( dsp n dsp ) + ! ( write n on the stack ) + DSP! EXIT ( restore the parameter stack pointer, immediately exit ) + THEN + 4+ + REPEAT + + ( No matching catch - print a message and restart the INTERPRETer. ) + DROP + + CASE + 0 1- OF ( ABORT ) + ." ABORTED" CR + ENDOF + ( default case ) + ." UNCAUGHT THROW " + DUP . CR + ENDCASE + QUIT + THEN +; + +: ABORT ( -- ) + 0 1- THROW +; + +( Print a stack trace by walking up the return stack. ) +: PRINT-STACK-TRACE + RSP@ ( start at caller of this function ) + BEGIN + DUP R0 4- < ( RSP < R0 ) + WHILE + DUP @ ( get the return stack entry ) + CASE + ' EXCEPTION-MARKER 4+ OF ( is it the exception stack frame? ) + ." CATCH ( DSP=" + 4+ DUP @ U. ( print saved stack pointer ) + ." ) " + ENDOF + ( default case ) + DUP + CFA> ( look up the codeword to get the dictionary entry ) + ?DUP IF ( and print it ) + 2DUP ( dea addr dea ) + ID. ( print word from dictionary entry ) + [ CHAR + ] LITERAL EMIT + SWAP >DFA 4+ - . ( print offset ) + THEN + ENDCASE + 4+ ( move up the stack ) + REPEAT + DROP + CR +; + +( + C STRINGS ---------------------------------------------------------------------- + + FORTH strings are represented by a start address and length kept on the stack or in memory. + + Most FORTHs don't handle C strings, but we need them in order to access the process arguments + and environment left on the stack by the Linux kernel, and to make some system calls. + + Operation Input Output FORTH word Notes + ---------------------------------------------------------------------- + + Create FORTH string addr len S" ..." + + Create C string c-addr Z" ..." + + C -> FORTH c-addr addr len DUP STRLEN + + FORTH -> C addr len c-addr CSTRING Allocated in a temporary buffer, so + should be consumed / copied immediately. + FORTH string should not contain NULs. + + For example, DUP STRLEN TELL prints a C string. +) + +( + Z" .." is like S" ..." except that the string is terminated by an ASCII NUL character. + + To make it more like a C string, at runtime Z" just leaves the address of the string + on the stack (not address & length as with S"). To implement this we need to add the + extra NUL to the string and also a DROP instruction afterwards. Apart from that the + implementation just a modified S". +) +: Z" IMMEDIATE + STATE @ IF ( compiling? ) + ' LITSTRING , ( compile LITSTRING ) + HERE @ ( save the address of the length word on the stack ) + 0 , ( dummy length - we don't know what it is yet ) + BEGIN + KEY ( get next character of the string ) + DUP '"' <> + WHILE + HERE @ C! ( store the character in the compiled image ) + 1 HERE +! ( increment HERE pointer by 1 byte ) + REPEAT + 0 HERE @ C! ( add the ASCII NUL byte ) + 1 HERE +! + DROP ( drop the double quote character at the end ) + DUP ( get the saved address of the length word ) + HERE @ SWAP - ( calculate the length ) + 4- ( subtract 4 (because we measured from the start of the length word) ) + SWAP ! ( and back-fill the length location ) + ALIGN ( round up to next multiple of 4 bytes for the remaining code ) + ' DROP , ( compile DROP (to drop the length) ) + ELSE ( immediate mode ) + HERE @ ( get the start address of the temporary space ) + BEGIN + KEY + DUP '"' <> + WHILE + OVER C! ( save next character ) + 1+ ( increment address ) + REPEAT + DROP ( drop the final " character ) + 0 SWAP C! ( store final ASCII NUL ) + HERE @ ( push the start address ) + THEN +; + +: STRLEN ( str -- len ) + DUP ( save start address ) + BEGIN + DUP C@ 0<> ( zero byte found? ) + WHILE + 1+ + REPEAT + + SWAP - ( calculate the length ) +; + +: CSTRING ( addr len -- c-addr ) + SWAP OVER ( len saddr len ) + HERE @ SWAP ( len saddr daddr len ) + CMOVE ( len ) + + HERE @ + ( daddr+len ) + 0 SWAP C! ( store terminating NUL char ) + + HERE @ ( push start address ) +; + +( + THE ENVIRONMENT ---------------------------------------------------------------------- + + Linux makes the process arguments and environment available to us on the stack. + + The top of stack pointer is saved by the early assembler code when we start up in the FORTH + variable S0, and starting at this pointer we can read out the command line arguments and the + environment. + + Starting at S0, S0 itself points to argc (the number of command line arguments). + + S0+4 points to argv[0], S0+8 points to argv[1] etc up to argv[argc-1]. + + argv[argc] is a NULL pointer. + + After that the stack contains environment variables, a set of pointers to strings of the + form NAME=VALUE and on until we get to another NULL pointer. + + The first word that we define, ARGC, pushes the number of command line arguments (note that + as with C argc, this includes the name of the command). +) +: ARGC + S0 @ @ +; + +( + n ARGV gets the nth command line argument. + + For example to print the command name you would do: + 0 ARGV TELL CR +) +: ARGV ( n -- str u ) + 1+ CELLS S0 @ + ( get the address of argv[n] entry ) + @ ( get the address of the string ) + DUP STRLEN ( and get its length / turn it into a FORTH string ) +; + +( + ENVIRON returns the address of the first environment string. The list of strings ends + with a NULL pointer. + + For example to print the first string in the environment you could do: + ENVIRON @ DUP STRLEN TELL +) +: ENVIRON ( -- addr ) + ARGC ( number of command line parameters on the stack to skip ) + 2 + ( skip command line count and NULL pointer after the command line args ) + CELLS ( convert to an offset ) + S0 @ + ( add to base stack address ) +; + +( + SYSTEM CALLS AND FILES ---------------------------------------------------------------------- + + Miscellaneous words related to system calls, and standard access to files. +) + +( BYE exits by calling the Linux exit(2) syscall. ) +: BYE ( -- ) + 0 ( return code (0) ) + SYS_EXIT ( system call number ) + SYSCALL1 +; + +( + UNUSED returns the number of cells remaining in the user memory (data segment). + + For our implementation we will use Linux brk(2) system call to find out the end + of the data segment and subtract HERE from it. +) +: GET-BRK ( -- brkpoint ) + 0 SYS_BRK SYSCALL1 ( call brk(0) ) +; + +: UNUSED ( -- n ) + GET-BRK ( get end of data segment according to the kernel ) + HERE @ ( get current position in data segment ) + - + 4 / ( returns number of cells ) +; + +( + MORECORE increases the data segment by the specified number of (4 byte) cells. + + NB. The number of cells requested should normally be a multiple of 1024. The + reason is that Linux can't extend the data segment by less than a single page + (4096 bytes or 1024 cells). + + This FORTH doesn't automatically increase the size of the data segment "on demand" + (ie. when , (COMMA), ALLOT, CREATE, and so on are used). Instead the programmer + needs to be aware of how much space a large allocation will take, check UNUSED, and + call MORECORE if necessary. A simple programming exercise is to change the + implementation of the data segment so that MORECORE is called automatically if + the program needs more memory. +) +: BRK ( brkpoint -- ) + SYS_BRK SYSCALL1 +; + +: MORECORE ( cells -- ) + CELLS GET-BRK + BRK +; + +( + Standard FORTH provides some simple file access primitives which we model on + top of Linux syscalls. + + The main complication is converting FORTH strings (address & length) into C + strings for the Linux kernel. + + Notice there is no buffering in this implementation. +) + +: R/O ( -- fam ) O_RDONLY ; +: R/W ( -- fam ) O_RDWR ; + +: OPEN-FILE ( addr u fam -- fd 0 (if successful) | c-addr u fam -- fd errno (if there was an error) ) + ROT ( fam addr u ) + CSTRING ( fam cstring ) + SYS_OPEN SYSCALL2 ( open (filename, flags) ) + DUP ( fd fd ) + DUP 0< IF ( errno? ) + NEGATE ( fd errno ) + ELSE + DROP 0 ( fd 0 ) + THEN +; + +: CREATE-FILE ( addr u fam -- fd 0 (if successful) | c-addr u fam -- fd errno (if there was an error) ) + O_CREAT OR + O_TRUNC OR + ROT ( fam addr u ) + CSTRING ( fam cstring ) + 420 ROT ( 0644 fam cstring ) + SYS_OPEN SYSCALL3 ( open (filename, flags|O_TRUNC|O_CREAT, 0644) ) + DUP ( fd fd ) + DUP 0< IF ( errno? ) + NEGATE ( fd errno ) + ELSE + DROP 0 ( fd 0 ) + THEN +; + +: CLOSE-FILE ( fd -- 0 (if successful) | fd -- errno (if there was an error) ) + SYS_CLOSE SYSCALL1 + NEGATE +; + +: READ-FILE ( addr u fd -- u2 0 (if successful) | addr u fd -- 0 0 (if EOF) | addr u fd -- u2 errno (if error) ) + ROT SWAP -ROT ( u addr fd ) + SYS_READ SYSCALL3 + + DUP ( u2 u2 ) + DUP 0< IF ( errno? ) + NEGATE ( u2 errno ) + ELSE + DROP 0 ( u2 0 ) + THEN +; + +( + PERROR prints a message for an errno, similar to C's perror(3) but we don't have the extensive + list of strerror strings available, so all we can do is print the errno. +) +: PERROR ( errno addr u -- ) + TELL + ':' EMIT SPACE + ." ERRNO=" + . CR +; + +( + NOTES ---------------------------------------------------------------------- + + DOES> isn't possible to implement with this FORTH because we don't have a separate + data pointer. +) + +( + WELCOME MESSAGE ---------------------------------------------------------------------- + + Print the version and OK prompt. +) + +: WELCOME + S" TEST-MODE" FIND NOT IF + ." JONESFORTH VERSION " VERSION . CR + UNUSED . ." CELLS REMAINING" CR + ." OK " + THEN +; + +WELCOME +HIDE WELCOME