Sather Runtime ModelProgrammer's InterfaceIn this section, we describe what ANTLR generates after reading your grammar file and how to use that output to parse input. The classes from which your lexer, token, and parser classes are derived are provided as well. What ANTLR generatesANTLR generates the following types of files, where MY_PARSER, MY_LEXER, and MY_TREE_PARSER are names of grammar classes specified in the grammar file. You may have an arbitrary number of parsers, lexers, and tree-parsers per grammar file; a separate class file will be generated for each. In addition, token type files will be generated containing the token vocabularies used in the parsers and lexers. One or more token vocabularies may be defined in a grammar file, and shared between different grammars. For example, given the grammar file: options { language="Sather"; } class MyParser extends Parser; options { exportVocab=MY; } ... rules ... class MY_LEXER extends Lexer; options { exportVocab=MY; } ... rules ... class MY_TREE_PARSER extends TreeParser; options { exportVocab=MY; } ... rules ... The following files will be generated:
The programmer uses the classes by referring to them:
If your parser generates an AST, then get the AST value, create a tree-parser, and invoke one of the tree-parser rules using the AST. lex ::= #MY_LEXER{ANTLR_COMMON_TOKEN}( file_stream ); parser ::= MY_PARSER{ANTLR_COMMON_TOKEN,ANTLR_COMMON_AST}( lex, user-defined-args-if-any ); parser.start-rule; -- and, if you are tree parsing the result... tree_parser ::= #MY_TREE_PARSER{ANTLR_COMMON_AST}; tree_parser.start-rule( parser.ast ); The lexer and parser can cause exceptions of type $ANTLR_RECOGNITION_EXCEPTIONS, which you may catch: lexer ::= #CALC_LEXER{ANTLR_COMMON_TOKEN}( file_stream ); parser ::= #CALC_PARSER{ANTLR_COMMON_TOKEN,ANTLR_COMMON_AST}(lexer); -- Parse the input expression protect parser.expr; when $ANTLR_RECOGNITION_EXCEPTION #ERR + exception.str + "\n"; end; Multiple Lexers/Parsers With Shared Input StateOccasionally, you will want two parsers or two lexers to share input state; that is, you will want them to pull input from the same source token stream or character stream. ANTLR 2.6.0 factored the input variables such as line number, guessing state, input
stream, etc... into a separate object so that another lexer or parser could same that
state. The -- create a Java lexer main_lexer ::= #JAVA_LEXER{ANTLR_COMMON_TOKEN}( input ); -- create javadoc lexer -- attach to shared input state of java lexer doclexer ::= #JAVADOC_LEXER{ANTLR_COMMON_TOKEN}( main_lexer.input_state ); Parsers with shared input state can be created similarly: jdocparser ::= #JAVA_DOC_PARSER{ANTLR_COMMON_TOKEN,ANTLR_COMMON_AST}( input_state ); jdocparser.content; -- go parse the comment Sharing state is easy, but what happens upon exception during the execution of the "subparser"? What about syntactic predicate execution? It turns out that invoking a subparser with the same input state is exactly the same as calling another rule in the same parser as far as error handling and syntactic predicate guessing are concerned. If the parser is guessing before the call to the subparser, the subparser must continue guessing, right? Exceptions thrown inside the subparser must exit the subparser and return to enclosing erro handler or syntactic predicate handler. Parser ImplementationParser ClassANTLR generates a parser class (an extension of ANTLR_LLKPARSER) that contains a method for every rule in your grammar. The general format looks like: class MY_PARSER{ TOKEN < $ANTLR_TOKEN, AST < $ANTLR_AST{AST} } is include ANTLR_LLKPARSER{ TOKEN, AST } create -> super_create; include CALC_PARSER_TOKENTYPES; create ( token_buf : ANTLR_TOKEN_BUFFER{TOKEN} , k : INT ) : SAME is res : SAME := super_create( token_buf, k ); res.token_names := sa_token_names; return res; end; create ( token_buf : ANTLR_TOKEN_BUFFER{TOKEN} ) : SAME is return #SAME( token_buf, 1); end; create ( lexer : $ANTLR_TOKEN_STREAM{TOKEN} , k : INT ) : SAME is res : SAME := super_create( lexer, k ); res.token_names := sa_token_names; return res; end; create( lexer : $ANTLR_TOKEN_STREAM{TOKEN} ) : SAME is res : SAME := #SAME( lexer, 1); return res; end; create ( state : ANTLR_PARSER_SHARED_INPUT_STATE{TOKEN} ) : SAME is res : SAME := super_create( state,1); res.token_names := sa_token_names; return res; end; ... -- add your own constructors here... rule-definitions end; Parser MethodsANTLR generates recursive-descent parsers, therefore, every rule in the grammar will result in a method that applies the specified grammatical structure to the input token stream. The general form of a parser method looks like: rule is init-action-if-present if ( lookahead-predicts-production-1 ) then code-to-match-production-1 elsif ( lookahead-predicts-production-2 ) then code-to-match-production-2 ... elsif ( lookahead-predicts-production-n ) then code-to-match-production-n else -- syntax error raise #ANTLR_NO_VIABLE_ALT_EXCEPTION(LT(1)); end; end;This code results from a rule of the form: rule: production-1 | production-2 ... | production-n ; If you have specified arguments and a return type for the rule, the method header changes to: (* generated from: * rule(user-defined-args) * returns return-type : ... ; *) rule( user-defined-args ) : return-type is ... end; Token types are integers and we make heavy use of sets and range comparisons to avoid excessively-long test expressions. EBNF SubrulesSubrules are like unlabeled rules, consequently, the code generated for an EBNF subrule mirrors that generated for a rule. The only difference is induced by the EBNF subrule operators that imply optionality or looping. (...)? optional subrule. The only difference between the code generated for an optional subrule and a rule is that there is no default else-clause to throw an exception--the recognition continues on having ignored the optional subrule. init-action-if-present if ( lookahead-predicts-production-1 ) then code-to-match-production-1 elsif ( lookahead-predicts-production-2 ) then code-to-match-production-2 ... elsif ( lookahead-predicts-production-n ) then code-to-match-production-n end; Not testing the optional paths of optional blocks has the potential to delay the detection of syntax errors. (...)* closure subrule. A closure subrule is like an optional looping subrule, therefore, we wrap the code for a simple subrule in a "forever" loop that exits whenever the lookahead is not consistent with any of the alternative productions. init-action-if-present loop if ( lookahead-predicts-production-1 ) then code-to-match-production-1 elsif ( lookahead-predicts-production-2 ) then code-to-match-production-2 ... elsif ( lookahead-predicts-production-n ) then code-to-match-production-n else break!; end; end; While there is no need to explicity test the lookahead for consistency with the exit path, the grammar analysis phase computes the lookahead of what follows the block. The lookahead of what follows much be disjoint from the lookahead of each alternative otherwise the loop will not know when to terminate. For example, consider the following subrule that is nondeterministic upon token A. ( A | B )* A Upon A, should the loop continue or exit? One must also ask if the loop should even begin. Because you cannot answer these questions with only one symbol of lookahead, the decision is non-LL(1). Not testing the exit paths of closure loops has the potential to delay the detection of syntax errors. As a special case, a closure subrule with one alternative production results in: init-action-if-present loop while!( lookahead-predicts-production-1 ); code-to-match-production-1 end; This special case results in smaller, faster, and more readable code. (...)+ positive closure subrule. A positive closure subrule is a loop around a series of production prediction tests like a closure subrule. However, we must guarantee that at least one iteration of the loop is done before proceeding to the construct beyond the subrule. sa_cnt : INT := 0; init-action-if-present loop if ( lookahead-predicts-production-1 ) then code-to-match-production-1 elsif ( lookahead-predicts-production-2 ) then code-to-match-production-2 ... elsif ( lookahead-predicts-production-n ) then code-to-match-production-n elsif ( sa_cnt>1 ) then -- lookahead predicted nothing and we've -- done an iteration break!; else raise #ANTLR_NO_VIABLE_ALT_EXCEPTION(LT(1)); end; sa_cnt := sa_cnt + 1; -- track times through the loop end; While there is no need to explicity test the lookahead for consistency with the exit path, the grammar analysis phase computes the lookahead of what follows the block. The lookahead of what follows much be disjoint from the lookahead of each alternative otherwise the loop will not know when to terminate. For example, consider the following subrule that is nondeterministic upon token A. ( A | B )+ A Upon A, should the loop continue or exit? Because you cannot answer this with only one symbol of lookahead, the decision is non-LL(1). Not testing the exit paths of closure loops has the potential to delay the detection of syntax errors. You might ask why we do not have a while loop that tests to see if the lookahead is consistent with any of the alternatives (rather than having series of tests inside the loop with a break). It turns out that we can generate smaller code for a series of tests than one big one. Moreover, the individual tests must be done anyway to distinguish between alternatives so a while condition would be redundant. As a special case, if there is only one alternative, the following is generated: init-action-if-present loop code-to-match-production-1 if ( lookahead-predicts-production-1 ) then break!; end; end; Optimization. When there are a large (where large is user-definable) number of strictly LL(1) prediction alternatives, then a case-statement can be used rather than a sequence of if-statements. The non-LL(1) cases are handled by generating the usual if-statements in the else case. For example: case ( LA(1) ) when KEY_WHILE, KEY_IF, KEY_DO then statement; when KEY_INT, KEY_FLOAT then declaration; else -- do whatever else-clause is appropriate end; This optimization relies on the compiler building a more direct jump (via jump table or hash table) to the ith production matching code. This is also more readable and faster than a series of set membership tests. Production PredictionLL(1) prediction. Any LL(1) prediction test is a simple set membership test. If the set is a singleton set (a set with only one element), then an integer token type = comparison is done. If the set degree is greater than one, a set is created and the single input token type is tested for membership against that set. For example, consider the following rule: a : A | b ; b : B | C | D | E | F; The lookahead that predicts production one is {A} and the lookahead that predicts production two is {B,C,D,E,F}. The following code would be generated by ANTLR for rule a (slightly cleaned up for clarity): a is if ( LA(1) = A ) then match(A); elsif ( token_set1.member(LA(1)) ) then b; end; end; The prediction for the first production can be done with a simple integer comparison, but the second alternative uses a set membership test for speed, which you probably didn't recognize as testing LA(1) member {B,C,D,E,F}. The complexity threshold above which set-tests are generated is user-definable. We use arrays of BOOLs to hold sets. The various sets needed by ANTLR are created and initialized in the generated parser (or lexer) class. Approximate LL(k) prediction. An extension of LL(1)...basically we do a series of up to k set tests rather than a single as we do in LL(1) prediction. Each decision will use a different amount of lookahead, with LL(1) being the dominant decision type. Production Element RecognitionToken references. Token references are translated to: match(token-type); For example, a reference to token KEY_BEGIN results in: match(KEY_BEGIN); where KEY_BEGIN will be an integer constant defined in the MY_PARSER_TOKENTYPES class generated by ANTLR. String literal references. String literal references are references to automatically generated tokens to which ANTLR automatically assigns a token type (one for each unique string). String references are translated to: match(T); where T is the token type assigned by ANTLR to that token. Character literal references. Referencing a character literal implies that the current rule is a lexical rule. Single characters, 't', are translated to: match('t'); which can be manually inlined with: if ( c = 't' ) then consume; else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; if the method call proves slow (at the cost of space). Wildcard references. In lexical rules, the wildcard is translated to: consume; which simply gets the next character of input without doing a test. References to the wildcard in a parser rule results in the same thing except that the consume call will be with respect to the parser. Not operator. When operating on a token, ~T is translated to: match_not( T ); When operating on a character literal, 't' is translated to: match_Not( 't' ); Range operator. In parser rules, the range operator (T1..T2) is translated to: match_range( T1, T2 ); In a lexical rule, the range operator for characters c1..c2 is translated to: match_range( c1, c2 ); Labels. Element labels on atom references become TOKENS references in parser rules and INTs in lexical rules. For example, the parser rule: a : id:ID { OUT::create + "id is " + id + '\n'; } ;would be translated to: a is id : TOKEN := void; id := LT(1); match(ID); OUT::create + "id is " + id + '\n'; end;For lexical rules such as: ID : w:. { OUT::create + "w is "+ w + '\n'; } ;the following code would result: ID is w : CHAR; w := c; consume; -- match wildcard (anything) OUT::create + "w is "+ w + '\n'; end; Labels on rule references result in AST references, when generating trees, of the form label_ast. Rule references. Rule references become method calls. Arguments to rules become arguments to the invoked methods. Return values are assigned like Sather assignments. Consider rule reference i=list[1] to rule: list[scope:INT] returns INT : { return scope+3; } ;The rule reference would be translated to: i := list(1); Semantic actions. Actions are translated verbatim to the output parser or lexer except for the translations required for AST generation. To add members to a lexer or parser class definition, add the class member definitions enclosed in {} immediately following the class specification, for example: class MY_PARSER extends Parser; { private i : INT; create ( lexer : ANTLR_TOKEN_STREAM{TOKEN}, aUsefulArgument : INT ) : SAME is i := aUsefulArgument; end; } ... rules ... ANTLR collects everything inside the {...} and inserts it in the class definition before the rule-method definitions. Semantic predicates. Lexer ImplementationLexer FormThe lexers produced by ANTLR 2.x are a lot like the parsers produced by ANTLR 2.x. They only major differences are that (a) scanners use characters instead of tokens, and (b) ANTLR generates a special next_token rule for each scanner which is a production containing each public lexer rule as an alternate. The name of the lexical grammar class provided by the programmer results in a subclass of ANTLR_CHARS_CANNER, for example class MY_LEXER{TOKEN} < $ANTLR_TOKEN_STREAM{TOKEN} , $ANTLR_FILE_CURSOR is include ANTLR_CHAR_SCANNER{TOKEN} create -> private char_scanner_create; include CALC_PARSER_TOKENTYPES; create ( istr : $ISTREAM ) : SAME is ... end; create ( bb : ANTLR_BYTE_BUFFER ) : SAME is ... end; create ( state : ANTLR_LEXER_SHARED_INPUT_STATE ) : SAME is ... end; next_token : TOKEN is scanning logic ... end; recursive and other non-inlined lexical methods ... end; When an ANTLR-generated parser needs another token from its lexer, it calls a method called next_token. The general form of the next_token method is: next_token : TOKEN is ss_ttype : INT; loop protect reset_text; case ( LA(1) ) case for each char predicting lexical rule call lexical rule gets token type -> sa_ttype else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; if ( sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then return make_token( sa_ttype ); end; when $ANTLR_RECOGNITION_EXCEPTION then report_error( exception.str ); end; end; end; For example, the lexical rules: class LEX extends Lexer; WS : ('\t' | '\r' | ' ') {sa_ttype := ANTLR_COMMON_TOKEN::SKIP;} ; PLUS : '+'; MINUS: '-'; INT : ( '0'..'9' )+ ; ID : ( 'a'..'z' )+ ; UID : ( 'A'..'Z' )+ ;would result in something like: class LEX{TOKEN} < $ANTLR_TOKEN_STREAM{TOKEN} , $ANTLR_FILE_CURSOR is next_token : TOKEN is sa_rettoken : TOKEN; continue : BOOL := true; loop sa_ttype : INT := ANTLR_COMMON_TOKEN::INVALID_TYPE; reset_text; protect -- for char stream error handling protect -- for lexical error handling case ( LA(1) ) when '\t' , '\r' , ' ' then mWS( true ); sa_rettoken := sa_return_token; when '+' then mPLUS( true ); sa_rettoken := sa_return_token; when '-' then mMINUS( true ); sa_rettoken := sa_return_token; when '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' then mINT( true ); sa_rettoken := sa_return_token; when 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' then mID( true ); sa_rettoken := sa_return_token; when 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' then mUID( true ); sa_rettoken := sa_return_token; else -- default if ( LA(1) = EOF_CHAR ) then upon_eof; sa_return_token := make_token( ANTLR_COMMON_TOKEN::EOF_TYPE); else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; -- if end; -- case if ( ~void(sa_return_token) and continue ) then; sa_ttype := sa_return_token.ttype; sa_ttype := test_literals_table(sa_ttype); sa_return_token.ttype := sa_ttype; return sa_return_token; end; -- if when $ANTLR_RECOGNITION_EXCEPTION then report_error( exception ); consume; end; -- protect when $ANTLR_CHAR_STREAM_EXCEPTION then raise #ANTLR_TOKEN_STREAM_EXCEPTION( exception.message ); end; -- protect end; -- loop end; -- next_token mWS( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := WS; sa_save_index : INT; case ( LA(1) ) when '\t' then match('\t'); when '\r' then match('\r'); when ' ' then match(' '); else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; -- case sa_ttype := ANTLR_COMMON_TOKEN::SKIP; if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule mPLUS( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := PLUS; sa_save_index : INT; match('+'); if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule mMINUS( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := MINUS; sa_save_index : INT; match('-'); if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule mINT( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := INT; sa_save_index : INT; sa0_cnt7 : INT := 0; loop if (((LA(1) >= '0' and LA(1) <= '9'))) then match_range( '0', '9' ); else if ( sa0_cnt7 >= 1 ) then break! else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; -- if end; -- if sa0_cnt7 := sa0_cnt7 + 1; end; -- loop if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule mID( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := ID; sa_save_index : INT; sa1_cnt10 : INT := 0; loop if (((LA(1) >= 'a' and LA(1) <= 'z'))) then match_range( 'a', 'z' ); else if ( sa1_cnt10 >= 1 ) then break! else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; -- if end; -- if sa1_cnt10 := sa1_cnt10 + 1; end; -- loop if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule mUID( sa_create_token : BOOL ) is sa_ttype : INT; sa_token : TOKEN; sa_begin : INT := text.length; sa_ttype := UID; sa_save_index : INT; sa2_cnt13 : INT := 0; loop if (((LA(1) >= 'A' and LA(1) <= 'Z'))) then match_range( 'A', 'Z' ); else if ( sa2_cnt13 >= 1 ) then break! else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( LA(1), file_name, line ); end; -- if end; -- if sa2_cnt13 := sa2_cnt13 + 1; end; -- loop if ( sa_create_token and void(sa_token) and sa_ttype /= ANTLR_COMMON_TOKEN::SKIP ) then sa_token := make_token( sa_ttype ); sa_token.text := text.substring( sa_begin, text.length - sa_begin ); end; -- if sa_return_token := sa_token; end; -- rule end; -- class ANTLR-generated lexers assume that you will be reading streams of characters. If this is not the case, you must create your own lexer. Creating Your Own LexerTo create your own lexer, your Sather class that will doing the lexing must implement interface $ANTLR_TOKEN_STREAM, which simply states that you must be able to return a stream of tokens conforming to $ANTLR_TOKEN via next_token: abstract class $ANTLR_TOKEN_STREAM{TOKEN < $ANTLR_TOKEN} is next_token : TOKEN; end; ANTLR will not generate a lexer if you do not specify a lexical class. Launching a parser with a non-ANTLR-generated lexer is the same as launching a parser with an ANTLR-generated lexer: lex ::= #HAND_BUILT_LEXER{MY_TOKEN}(...); p ::= #MY_PARSER{MY_TOKEN,ANTLR_COMMON_AST}(lex); p.start-rule; The parser does not care what kind of object you use for scanning as as long as it can answer next_token. If you build your own lexer, and the token values are also generated by that lexer, then you should inform the ANTLR-generated parsers about the token type values generated by that lexer. Use the importVocab in the parsers that use the externally-generated token set, and create a token definition file following the requirements of the importVocab option. Lexical RulesLexical rules are essentially the same as parser rules except that lexical rules apply a structure to a series of characters rather than a series of tokens. As with parser rules, each lexical rule results in a method in the output lexer class. Alternative blocks. Consider a simple series of alternatives within a block: FORMAT : 'x' | 'f' | 'd'; The lexer would contain the following method: mFORMAT is if ( c = 'x' ) then match('x'); elsif ( c = 'x' ) then match('x'); elsif ( c = 'f' ) then match('f'); elsif ( c = 'd' ) then match('d'); else raise #ANTLR_NO_VIABLE_ALT_FOR_CHAR_EXCEPTION( ... ); end; return FORMAT; end; The only real differences between lexical methods and grammar methods are that lookahead prediction expressions do character comparisons rather than LA(i) comparisons, match matches characters instead of tokens, and a return is added to the bottom of the rule. For example, the common identifier rule would be placed directly into the next_token method. That is, rule:ID : ( 'a'..'z' | 'A'..'Z' )+ ; would not result in a method in your lexer class. This rule would become part of the resulting lexer as it would be probably inlined by ANTLR: next_token : TOKEN is case ( LA(1) ) cases for operators and such here -- chars that predict ID token case '0', '1', '2', '3' '4', '5', '6', '7', '8', '9' then loop while!( c > ='0' and c < ='9' ); match_range( '0' , '9' ); end; return make_token(ID); else check harder stuff here like rules beginning with a..z end; If not inlined, the method for scanning identifiers would look like: mID : TOKEN is loop while!( c > = '0' and c < = '9' ) match_range( '0' , '9' ); end; return ID; end; where token names are converted to method names by prefixing them with the letter m. The next_token method would become: next_token : TOKEN is case ( c ) cases for operators and such here -- chars that predict ID token when '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' return make_token( mID ); else check harder stuff here like rules beginning with a..z end; Note that this type of range loop is so common that it should probably be optimized to: loop while! ( c >= '0' and c <= '9' ); consume; end; Optimization: Recursive lexical rules. Lexical rules that are directly or indirectly recursive are not inlined. For example, consider the following rule that matches nested actions: ACTION : '{' ( ACTION | ~'}' )* '}' ; ACTION would be result in (assuming a character vocabulary of 'a'..'z', '{', '}'): mACTION : INT is sa_ttype : INT := ACTION; match('{'); loop case ( LA(1) ) when '{' then mACTION; when 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' then match_not('}'); else break!; end; end; match('}'); return sa_ttype; end; Version: $Id: //depot/code/org.antlr/release/antlr-2.7.1/doc/sa-runtime.html#1 $ |