赞
踩
看flex官方文档作的一些笔记, 只记录一些我感觉可能会用到的部分.
%top
A %top block is similar to a ‘%{’ … ‘%}’ block, except that the code in a %top
block is relocated to the top of the generated file, before any flex definitions 1. The %top
block is useful when you want certain preprocessor macros to be defined or certain files to be included before the generated code. The single characters, ‘{’ and ‘}’ are used to delimit the %top
block, as show in the example below:
%top{
/* This code goes at the "top" of the generated file. */
#include <stdint.h>
#include <inttypes.h>
}
Multiple %top blocks are allowed, and their order is preserved.
%pointer
, %array
%pointer
or %array
in the first (definitions) section of your flex input. The default is %pointer
, unless you use the -l
lex compatibility option, in which case yytext will be an array. The advantage of using %pointer is substantially faster scanning and no buffer overflow when matching very large tokens (unless you run out of dynamic memory). The disadvantage is that you are restricted in how your actions can modify yytext (see Actions), and calls to the unput() function destroys the present contents of yytext, which can be a considerable porting headache when moving between different lex versions.The advantage of %array
is that you can then modify yytext to your heart’s content, and calls to unput() do not destroy yytext (see Actions). Furthermore, existing lex programs sometimes access yytext externally using declarations of the form:
extern char yytext[];
This definition is erroneous when used with %pointer
, but correct for %array
.
The %array
declaration defines yytext to be an array of YYLMAX characters, which defaults to a fairly large value. You can change the size by simply #define’ing YYLMAX to a different value in the first section of your flex input. As mentioned above, with %pointer
yytext grows dynamically to accommodate large tokens. While this means your %pointer
scanner can accommodate very large tokens (such as matching entire blocks of comments), bear in mind that each time the scanner must resize yytext it also must rescan the entire token from the beginning, so matching such tokens can prove slow. yytext presently does not dynamically grow if a call to unput() results in too much text being pushed back; instead, a run-time error results.
Also note that you cannot use %array
with C++ scanner classes (see Cxx).
%%
a |
ab |
abc |
abcd ECHO; REJECT;
.|\n /* eat up any unmatched character */
%%
mega- ECHO; yymore();
kludge ECHO;
First ‘mega-’ is matched and echoed to the output. Then ‘kludge’ is matched, but the previous ‘mega-’ is still hanging around at the beginning of yytext so the ECHO for the ‘kludge’ rule will actually write ‘mega-kludge’.
returns all but the first n characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match. yytext and yyleng are adjusted appropriately (e.g., yyleng will now be equal to n). For example, on the input ‘foobar’ the following will write out ‘foobarbar’:
%%
foobar ECHO; yyless(3);
[a-z]+ ECHO;
An argument of 0 to yyless() will cause the entire current input string to be scanned again. Unless you’ve changed how the scanner will subsequently process its input (using BEGIN, for example), this will result in an endless loop.
yyterminate() can be used in lieu of a return statement in an action. It terminates the scanner and returns a 0 to the scanner’s caller, indicating “all done”. By default, yyterminate() is also called when an end-of-file is encountered. It is a macro and may be redefined.
The output of flex is the file lex.yy.c, which contains the scanning routine yylex(), a number of tables used by it for matching tokens, and a number of auxiliary routines and macros. By default, yylex() is declared as follows:
int yylex()
{
... various definitions and the actions in here ...
}
This definition may be changed by defining the YY_DECL macro. For example, you could use:
#define YY_DECL float lexscan(float a, float b )
When the scanner receives an end-of-file indication from YY_INPUT, it then checks the yywrap() function. If yywrap() returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.
If you do not supply your own version of yywrap(), then you must either use %option noyywrap (in which case the scanner behaves as though yywrap() returned 1), or you must link with ‘-lfl’ to obtain the default version of the routine, which always returns 1.
The special rule <> indicates actions which are to be taken when an end-of-file is encountered and yywrap() returns non-zero (i.e., indicates no further files to process). The action must finish by doing one of the following things:
assigning yyin to a new input file (in previous versions of flex, after doing the assignment you had to call the special action YY_NEW_FILE. This is no longer necessary.)
executing a return statement;
executing the special yyterminate() action.
or, switching to a new buffer using yy_switch_to_buffer() as shown in the example above.
These rules are useful for catching things like unclosed comments. An example:
%x quote
%%
...other rules for dealing with quotes...
<quote><<EOF>> {
error( "unterminated quote" );
yyterminate();
}
<<EOF>> {
if ( *++filelist )
yyin = fopen( *filelist, "r" );
else
yyterminate();
}
Values Available To the User
Even though there are many scanner options, a typical scanner might only specify the following options:
%option 8bit reentrant bison-bridge
%option warn nodefault
%option yylineno
%option outfile="scanner.c" header-file="scanner.h"
The first line specifies the general type of scanner we want. The second line specifies that we are being careful. The third line asks flex to track line numbers. The last line tells flex what to name the files. (The options can be specified in any order. We just divided them.)
flex also provides a mechanism for controlling options within the scanner specification itself, rather than from the flex command-line. This is done by including %option directives in the first section of the scanner specification. You can specify multiple options with a single %option directive, and multiple directives in the first section of your flex input file.
Most options are given simply as names, optionally preceded by the word ‘no’ (with no intervening whitespace) to negate their meaning. The names are the same as their long-option equivalents (but without the leading ‘–’ ).
The main design goal of flex is that it generate high-performance scanners. It has been optimized for dealing well with large sets of rules. Aside from the effects on scanner speed of the table compression ‘-C’ options outlined above, there are a number of options/actions which degrade performance. These are, from most expensive to least:
REJECT
arbitrary trailing context
pattern sets that require backing up
%option yylineno
%array
%option interactive
%option always-interactive
^ beginning-of-line operator
yymore()
There is one case when %option yylineno can be expensive. That is when your patterns match long tokens that could possibly contain a newline character. There is no performance penalty for rules that can not possibly match newlines, since flex does not need to check them for newlines.
In general, you should avoid rules such as [^f]+, which match very long tokens, including newlines, and may possibly match your entire file! A better approach is to separate [^f]+
into two rules:
%option yylineno
%%
[^f\n]+
\n+
The above scanner does not incur a performance penalty.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。