当前位置:   article > 正文

词法分析工具flex学习笔记_%x flex

%x flex

看flex官方文档作的一些笔记, 只记录一些我感觉可能会用到的部分.

  • %top
    A %top block is similar to a ‘%{’ … ‘%}’ block, except that the code in a %top block is relocated to the top of the generated file, before any flex definitions 1. The %top block is useful when you want certain preprocessor macros to be defined or certain files to be included before the generated code. The single characters, ‘{’ and ‘}’ are used to delimit the %top block, as show in the example below:

     %top{
         /* This code goes at the "top" of the generated file. */
         #include <stdint.h>
         #include <inttypes.h>
     }
    
    • 1
    • 2
    • 3
    • 4
    • 5

Multiple %top blocks are allowed, and their order is preserved.

  • %pointer, %array
    Note that yytext can be defined in two different ways: either as a character pointer or as a character array. You can control which definition flex uses by including one of the special directives %pointer or %array in the first (definitions) section of your flex input. The default is %pointer, unless you use the -l lex compatibility option, in which case yytext will be an array. The advantage of using %pointer is substantially faster scanning and no buffer overflow when matching very large tokens (unless you run out of dynamic memory). The disadvantage is that you are restricted in how your actions can modify yytext (see Actions), and calls to the unput() function destroys the present contents of yytext, which can be a considerable porting headache when moving between different lex versions.

The advantage of %array is that you can then modify yytext to your heart’s content, and calls to unput() do not destroy yytext (see Actions). Furthermore, existing lex programs sometimes access yytext externally using declarations of the form:

     extern char yytext[];
  • 1

This definition is erroneous when used with %pointer, but correct for %array.

The %array declaration defines yytext to be an array of YYLMAX characters, which defaults to a fairly large value. You can change the size by simply #define’ing YYLMAX to a different value in the first section of your flex input. As mentioned above, with %pointer yytext grows dynamically to accommodate large tokens. While this means your %pointer scanner can accommodate very large tokens (such as matching entire blocks of comments), bear in mind that each time the scanner must resize yytext it also must rescan the entire token from the beginning, so matching such tokens can prove slow. yytext presently does not dynamically grow if a call to unput() results in too much text being pushed back; instead, a run-time error results.

Also note that you cannot use %array with C++ scanner classes (see Cxx).

  • An action consisting solely of a vertical bar (‘|’) means “same as the action for the next rule”. See below for an illustration.
 %%
 a        |
 ab       |
 abc      |
 abcd     ECHO; REJECT;
 .|\n     /* eat up any unmatched character */
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • yymore()
 %%
 mega-    ECHO; yymore();
kludge   ECHO;
  • 1
  • 2
  • 3

First ‘mega-’ is matched and echoed to the output. Then ‘kludge’ is matched, but the previous ‘mega-’ is still hanging around at the beginning of yytext so the ECHO for the ‘kludge’ rule will actually write ‘mega-kludge’.

  • yyless(n)

returns all but the first n characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match. yytext and yyleng are adjusted appropriately (e.g., yyleng will now be equal to n). For example, on the input ‘foobar’ the following will write out ‘foobarbar’:

%%
foobar    ECHO; yyless(3);
[a-z]+    ECHO;
  • 1
  • 2
  • 3

An argument of 0 to yyless() will cause the entire current input string to be scanned again. Unless you’ve changed how the scanner will subsequently process its input (using BEGIN, for example), this will result in an endless loop.

  • yyterminate()

yyterminate() can be used in lieu of a return statement in an action. It terminates the scanner and returns a 0 to the scanner’s caller, indicating “all done”. By default, yyterminate() is also called when an end-of-file is encountered. It is a macro and may be redefined.

  • #define YY_DECL

The output of flex is the file lex.yy.c, which contains the scanning routine yylex(), a number of tables used by it for matching tokens, and a number of auxiliary routines and macros. By default, yylex() is declared as follows:

     int yylex()
         {
         ... various definitions and the actions in here ...
         }
  • 1
  • 2
  • 3
  • 4

This definition may be changed by defining the YY_DECL macro. For example, you could use:

     #define YY_DECL float lexscan(float a, float b )
  • 1
  • %option noyywrap

When the scanner receives an end-of-file indication from YY_INPUT, it then checks the yywrap() function. If yywrap() returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.

If you do not supply your own version of yywrap(), then you must either use %option noyywrap (in which case the scanner behaves as though yywrap() returned 1), or you must link with ‘-lfl’ to obtain the default version of the routine, which always returns 1.

  • End-of-File Rules

The special rule <> indicates actions which are to be taken when an end-of-file is encountered and yywrap() returns non-zero (i.e., indicates no further files to process). The action must finish by doing one of the following things:

assigning yyin to a new input file (in previous versions of flex, after doing the assignment you had to call the special action YY_NEW_FILE. This is no longer necessary.)
executing a return statement;
executing the special yyterminate() action.
or, switching to a new buffer using yy_switch_to_buffer() as shown in the example above.
These rules are useful for catching things like unclosed comments. An example:

     %x quote
     %%

     ...other rules for dealing with quotes...

     <quote><<EOF>>   {
              error( "unterminated quote" );
              yyterminate();
              }
    <<EOF>>  {
              if ( *++filelist )
                  yyin = fopen( *filelist, "r" );
              else
                 yyterminate();
              }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • Values Available To the User

    • char *yytext
    • int yyleng
    • FILE *yyin
    • void yyrestart( FILE *new_file )
      may be called to point yyin at the new input file. The switch-over to the new file is immediate (any previously buffered-up input is lost). Note that calling yyrestart() with yyin as an argument thus throws away the current input buffer and continues scanning the same input file.
    • FILE *yyout
      is the file to which ECHO actions are done. It can be reassigned by the user.
    • YY_CURRENT_BUFFER
      returns a YY_BUFFER_STATE handle to the current buffer.
    • YY_START
      returns an integer value corresponding to the current start condition. You can subsequently use this value with BEGIN to return to that start condition.
  • Index of Scanner Options

Even though there are many scanner options, a typical scanner might only specify the following options:

 %option   8bit reentrant bison-bridge
 %option   warn nodefault
 %option   yylineno
 %option   outfile="scanner.c" header-file="scanner.h"
  • 1
  • 2
  • 3
  • 4

The first line specifies the general type of scanner we want. The second line specifies that we are being careful. The third line asks flex to track line numbers. The last line tells flex what to name the files. (The options can be specified in any order. We just divided them.)

flex also provides a mechanism for controlling options within the scanner specification itself, rather than from the flex command-line. This is done by including %option directives in the first section of the scanner specification. You can specify multiple options with a single %option directive, and multiple directives in the first section of your flex input file.

Most options are given simply as names, optionally preceded by the word ‘no’ (with no intervening whitespace) to negate their meaning. The names are the same as their long-option equivalents (but without the leading ‘–’ ).

  • Performance Considerations

The main design goal of flex is that it generate high-performance scanners. It has been optimized for dealing well with large sets of rules. Aside from the effects on scanner speed of the table compression ‘-C’ options outlined above, there are a number of options/actions which degrade performance. These are, from most expensive to least:

     REJECT
     arbitrary trailing context

     pattern sets that require backing up
     %option yylineno
     %array

     %option interactive
     %option always-interactive

     ^ beginning-of-line operator
     yymore()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

There is one case when %option yylineno can be expensive. That is when your patterns match long tokens that could possibly contain a newline character. There is no performance penalty for rules that can not possibly match newlines, since flex does not need to check them for newlines.

In general, you should avoid rules such as [^f]+, which match very long tokens, including newlines, and may possibly match your entire file! A better approach is to separate [^f]+ into two rules:

 %option yylineno
 %%
     [^f\n]+
     \n+
  • 1
  • 2
  • 3
  • 4

The above scanner does not incur a performance penalty.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/909263
推荐阅读
相关标签
  

闽ICP备14008679号