When implementing the WIG scanner you may have to take into account the
fact that WIG has multiple so-called lexical scopes. You already
encountered lexical scopes when implementing multi-line comments for
JOOS: inside comments, there exists a different set of tokens/keywords
than outside comments. WIG has even more such scopes. Consider the
following little WIG fragment:
service {
const html Compliment = <html> <body>
This is a <[fin]> great service, man!
</body> </html>;
const html Pledge = <html> <body>
What is your name?
<input name=name type="text" size=20>
</body> </html>;
string name; //name is an id here, although it is a keyword inside HTML tags
//inside HTML text, it's considered plain text
session Contribute() {
In this snippet I identified the following lexical scopes:
- WIG syntax: Here, stuff like service, const, html and so on
are keywords. "name" is not a keyword.
- HTML syntax:
- is entered when <html> is scanned and left when
</html> is scanned
- unlike in WIG syntax, service, const etc. are no keywords
- > and < have different meaning than in WIG syntax
(although the scanner may not necessarily have to distinguish those)
- HTML Tags: Here
input, name etc. should be keywords so that the parser can recognize
them specially.
- Holes: only
allow for identifiers - any
identifiers in fact, including those that would be keywords in other
scopes, e.g. <[html]> is valid
- HTML right-hand side
values: It may be useful to have another scope here so that e.g.
name is not recognized as a
keyword.
Can you think of other lexical scopes? What about HTML comments? Do
those exist in the benchmarks? If so, can HTML comments be nested?
You can extend a Flex scanner with lexical scopes using so-called start conditions.
You prefix a regular expression with <c> to denote that it should
only be scanned when being in state c. You switch to a state c by
calling BEGIN(c) in the scanner's action.
SableCC supports a similar mechanism using so-called states (see
pages 35 ff).