CHAPTER 25
GENERATING PHRASES IN ANOTHER LANGUAGE
Following are new notations for expressing phrase generation.
They may be used within semantic specifications to generate phrases
in another language.
These notations let you, the programmer, specify translations
~oblivious ~to ~the ~existence ~of ~any ~ambiguity ~at ~all. Ambiguity
is supported implicitly, and delightfully, it's relatively easy to
implement. We will see the implementation of each notation after we
present it. (Section 25.5 shows how ambiguities between domains meet).
Our main example has the syntax grammar's semantics generate phrases
in the datatype language. This is how we require that a specification
make sense syntactically ~and within the domain of datatypes.
The next chapter shows the complete specification of syntax rules
whose semantics generate phrases in the datatype language. There,
we will see an implementation of a programming language that involves
datatypes. That implementation will use the notations presented in
this chapter.
25.1
Notation For Generating Unit-Length Phrases
We now introduce notations for the generation of phrases. We've been
using one of the most important of these notations all along. Recall
that a rule is usually specified via the notation:
<POS(1):V(1)> ... <POS(k):V(k)> ->
<GIVE_POS: f(V(1),...,V(k))>
In general, we allow the righthand side of the rule to be a general
program, as in:
<POS(1):V(1)> <POS(2):V(2)> ... <POS(k):V(k)> ->
a program
---------------- Parentheses in previous paragraph mean subscripting! ---
25.2
The Righthand Side Of A Rule Is Always A STATEMENT
In fact, ~the ~righthand ~side ~of ~a ~rule ~is ~always ~a ~STATEMENT.
We've been living with this since Section 1.3. The notation:
<POS: f(...) >
is actually a STATEMENT!
25.2.1
The Righthand-Side Of A Rule Can Naturally Involve IFs
Because the give-phrase notation "<ID:STATEMENT>" is itself a
STATEMENT, we are able to express rules (from Chapter 3) like:
<EXPR[i]:x> <BOP[j]:y> <EXPR[k]:z> ->
IF i =< j & k < j THEN
<EXPR[j]: f(x,y,z) > FI
This rule uses the IF-THEN notation for STATEMENTs. This is perfectly
valid because the "<EXPR:...>" in the THEN clause is a STATEMENT, and
the righthand side of a rule is expected always to be a STATEMENT.
25.3
Phrase Generation Is A STATEMENT Anywhere, Even Beyond Rules' Righthand
Sides
In general, phrase generation STATEMENTs are supported by the rules:
< ID : STATEMENT > -> STATEMENT
< ID [ EXPR ] : STATEMENT > -> STATEMENT
The ID names the part-of-speech. The given STATEMENT specifies
the semantics. These translate ultimately into a call to GIVE
(Section 25.3.3 and Chapter 12).
Each generates a unit-length phrase whose part-of-speech is the ID and
whose semantics is the STATEMENT. We've seen this on the righthand
sides of rules.
The second rule here is for specifying array parts-of-speech, which
require an index (the EXPR, of type INT) to denote which ~one of the
array of parts-of-speech is intended for the new phrase.
For example, consider the STATEMENT forming the righthand side of the
rule:
<NUMBER:n> -> <EXPR: LOAD( 1, ADDRESS_OF(n) ); >
That STATEMENT:
<EXPR: LOAD( 1, ADDRESS_OF(n) ); >
generates an <EXPR> whose semantics is the STATEMENT within the
"<...>", the:
LOAD( 1, ADDRESS_OF(n) );
25.3.1
Semantics Can Also Be An EXPR
The STATEMENT within the angle-brackets can also be an EXPR, as in:
< ID : EXPR > -> STATEMENT
< ID [ EXPR ] : EXPR > -> STATEMENT
In general, parts-of-speech declared with the dash, e.g.,
POS EXPR : - ;
must be specified using the STATEMENT form inside the "<...>". All
other parts-of-speech, declared like:
POS ID : TEXT ;
where the semantics is a specified datatype, use the EXPR form inside
the "<...>".
The ":STATEMENT" or ":EXPR" may be omitted, as in:
< ID >
In the absence of semantic specification, the default semantics will
do nothing when it might be invoked, and will return 0, FALSE, or NIL
if it is used as a value.
BOX: What notation have we been using all along that
BOX: generates a phrase?
BOX:
BOX: Can this notation be used anywhere that a STATEMENT
BOX: is permissible?
25.3.2
The STATEMENT or EXPR Inside The "<...>" Is Rendered As A Process
Recall from Chapter 4 that we implicitly enclose any semantic
specification within the "//...\\" to render it as a process. This
was done to render all semantics as ~delayed semantics, so that ~no
semantics would be executed during the parsing action.
We implement that delaying transformation here.
When we specify (e.g., on the righthand side of a rule):
< POS : f(x,y,z) >
we actually deliver:
< POS : //[x;y;z;] f( <*x*>, <*y*>, <*z*> ) \\ >
The variables enclosed in the "//[...]", the context variables X, Y,
and Z, come from the lefthand part of a rule, the want-phrase. The
"//[...]" and "\\" are implicit. This delays the invocation of f.
Also, each appearence of those context variables in the body is
enclosed by the process invocation notation "<*...*>" if needed.
This ~undelays the delayed semantics associated with the variables
X, Y, and Z.
25.3.3
The Translation Of The "<...>" Into A Call To GIVE
The STATEMENT:
< ID : STATEMENT >
always translates to:
GIVE( [ LEFT:LEFT POS:ID SEM: //[...] STATEMENT \\ ] );
This is a full call to GIVE, like we saw in Chapter 15.
Beyond the given ID (part-of-speech) and STATEMENT (semantics), the
global variable ~LEFT is read.
LEFT is one of the variables read implicitly by the "<...>" notation.
Once GIVE is called, GIVE reads and writes the variable C as well,
even though we don't pass it in here (see Section 12.3 or 15.2.1).
LEFT and C are thus read implicitly by the STATEMENT:
< ID : STATEMENT >
The following introduces ways to manage these two implicit variables.
BOX: BOX: What does the phrase generation notation "<...>"
BOX: translate into?
BOX:
BOX: Could the righthand side of a rule consist of WHILE
BOX: statements, among other things?
BOX:
BOX: How has an IF statement on the righthand side been
BOX: helpful?
BOX:
BOX: How is Chapter 4's implicit delayed semantics
BOX: implemented?
25.4
The Two Endpoints Of Generated Phrases: The Global Variables LEFT and C
Whenever a phrase is generated, whether it be on the righthand side
of a rule, or as any action, two global variables define the
left and right endpoints for the new phrase.
LEFT, a CHOICE_OF_PHRASES, denotes the left neighbor for a generated
phrase. The variable C, also a CHOICE_OF_PHRASES, denotes the
righthand edge for the generated phrase. Figure 25.1 illustrates this.
The "<...>" notation, the GIVE, reads these two variables implicitly.
Thus, the values in those two variables are important whenever we
see a "<...>".
How are these variables set?
25.4.1
A Rule's Want-Phrase Sets LEFT
LEFT is set upon matching a want-phrase. Recall that a rule:
<POS(1)> <POS(2)> ... <POS(k)> -> some_action
is turned into a program via the following (from the end of Section
12.3.3). P is the PHRASE block passed to GIVE (and the grammar):
P(k):= P ;
IF P(k).POS = POS(k) THEN
FOR P(k-1) $E P(k).LEFT; WITH P(k-1).POS = POS(k-1) ;
!!
...
!!
FOR P(1) $E P(2).LEFT; WITH P(1).POS = POS(1);
DO
~LEFT:= ~P(1).LEFT;
some_action
END
FI
The "some_action" is executed in a context where LEFT points to the
matched phrase's lefthand neighbor.
---------------- Parentheses in previous paragraph around "1", "2"
---------------- "k", and "k-1" mean subscripting! ---------------------
The action is usually:
GIVE( [ LEFT: LEFT POS: the_righthand_part_of_speech
SEM: //...\\ ] );
This is so if the righthand side of the rule is of the form:
< ID : STATEMENT >
This action now executes in the context where LEFT is the lefthand
neighbor of the matched want-phrase. As always, C holds the rightmost
PHRASE block of the matched phrase (Section 12.3.1).
GIVE will put the newly generated phrase's rightmost block also onto
C. Thus, the newly generated unit-length phrase ~shares ~the ~same
~span (LEFT and C) with the matched want-phrase.
BOX: What do the variables LEFT and C signify?
BOX:
BOX: Why does the generated give-phrase (in a context-free
BOX: rule) share the same span with the matched occurence
BOX: of the rule's want-phrase?
25.4.2
Phrases Of Length Greater Than One
The treatment of the global variables LEFT and C that we saw with
general rewrite rules (Section 12.5) and the taking of user input
(Section 12.4), are now rendered implicit with a new notation.
25.4.2.1
Brief Notation For Setting LEFT and C
STATEMENTs may be combined in a way other than sequential execution.
The "-" may be used to combine STATEMENTs, as supported by the rule:
STATEMENT - STATEMENT - ... - STATEMENT -> STATEMENT
The dashes deal with the variables LEFT and C. LEFT and C are set
up especially for each of the individual STATEMENTs.
This dash notation is meant to be used for generating phrases of length
greater than one. For example:
<POS(1)> - <POS(2)>
combines with a dash the two STATEMENTS:
<POS(1)> and <POS(2)>
This forms a phrase of length two.
---------------- Parentheses in previous paragraph mean subscripting! ---
A general rewrite rule is actually written as:
<POS(1)> ... <POS(k)> -> <GIVE_POS(1)> - ... - <GIVE_POS(n)>
where dashes separate the phrase elements on the righthand side of
the rule. This is our official notation for general rewrite rules
(rather than the dashless notation used in Section 1.3).
---------------- Parentheses in previous paragraph around "1" and "k"
---------------- mean subscripting! -----------------------
How is the rule:
STATEMENT - STATEMENT - ... - STATEMENT -> STATEMENT
implemented? It translates (see Section 12.5) to the following.
(Back there, each STATEMENT was a call to GIVE):
HOLDING LEFT; "(Preserve LEFT so that it looks like we never
modified it)"
DO
RIGHTHAND:= C; "Remember present C, for our final
STATEMENT"
C:= NIL; ~STATEMENT(1)
LEFT:= C; C:= NIL; ~STATEMENT(2)
LEFT:= C; C:= NIL; ~STATEMENT(3)
...
LEFT:= C; C:= NIL; ~STATEMENT(n-1)
LEFT:= C; C:= RIGHTHAND; ~STATEMENT(n)
ENDHOLD
Assuming that each of the STATEMENTs is of the form:
< ID : STATEMENT >
then the overall effect is to generate the entire phrase (connected
by dashes) with LEFT as its left neighbor, where the rightmost phrase
block is appended onto C.
---------------- Parentheses in previous paragraph around "1", "2"
---------------- "3", and "n" and "n-1"" mean subscripting! ------------
That is, where we wrote:
< ID : STATEMENT >
to generate a phrase of length one, spanning from LEFT to C, we can now
write:
<ID : STATEMENT> - <ID : STATEMENT> - ... - <ID : STATEMENT>
to generate a phrase of length greater than one, also spanning from
LEFT to C.
NOTE: STATEMENTs connected with dashes bind together ~before
STATEMENTs separated by nothing (our usual way for putting
STATEMENTs together). Thus,
<A> <B> <C> - <D> <E>
groups as:
<A> <B> (<C> - <D>) <E>
Also:
<A> - <B> <C> - <D> - <E> <F> - <G>
groups as:
(<A> - <B>) (<C> - <D> - <E>) (<F> - <G>)
BOX: BOX: How do you combine STATEMENTs so as to generate a
BOX: phrase of length greater than one?
BOX:
BOX: Is the implementation identical to how we generate
BOX: the give-phrase of a general rewrite rule?
25.4.3
Ambiguous Phrase Generation: Phrase Generation Without The Dashes
Figure 25.2(a) shows what is built when you execute the statement:
<X> - <Y> - <Z>
Figure 25.2(b) shows what:
<X> <Y> <Z>
generates. With the dashes, we get a left-to-right phrase.
Without the dashes, the three short phrases become one ambiguous
phrase. They all become members of the same CHOICE_OF_PHRASES.
This is natural because the dashless STATEMENT calls GIVE once per
short phrase, without touching C nor LEFT. GIVE puts all its given
phrases onto the same CHOICE_OF_PHRASES, C.
For another example, figure 25.3 shows what is built upon the execution
of:
<A> <B> <C> - <D> <E>
Figure 25.4 shows:
<A> - <B> <C> - <D> - <E> <F> - <G>
Now suppose the procedures X and Y are defined by:
DEFINE X: <A> ENDDEFN
DEFINE Y: <B> - <C> ENDDEFN
The STATEMENT:
X; Y;
generates an ambiguous phrase consisting of the phrase "<A>" and
the phrase "<B><C>". In contrast:
X; - Y;
generates one phrase, "<A><B><C>". (The semicolons are part of the
atomic STATEMENTs, as:
X;
is the notation for calling procedure X). In general, the normal
~sequential execution of phrase generations "<...>" offers up the
phrases as ambiguous, multiple interpretations sharing the same span
(LEFT and C).
We will see examples of phrase generation from within the semantics of
syntax rules in next chapter.
BOX: What kind of phrase is generated by a sequence of
BOX: STATEMENTs without the dashes?
BOX:
BOX: Can phrase generation be packaged in procedures?
25.5
Two Kinds Of Ambiguity At Once - The Passage Of Ambiguity From One Domain
To The Next
Ambiguities can arise from the syntax grammar and also from the
datatype grammar. Both kinds of ambiguity manifest themselves in the
same way. They result in ambiguous datatype phrases.
25.5.1
Datatype Ambiguities
25.5.1.1
Ambiguities From Coercion
Let's first examine ambiguities that arise from the datatype grammar
alone. For example, the phrase:
1
is turned into the datatype phrase:
~INT
The coercion from INT to REAL:
~INT -> ~REAL
applied to the datatype phrase gives rise to the leftmost two blocks
in figure 25.5. Here we have an ambiguity between INT and REAL: The
"1" is either an INT or REAL.
25.5.1.2
Ambiguities From Polymorphism
Ambiguities may arise from polymorphism as well as coercion. We just
saw one introduced by a coercion.
Here is an example that involves "+"'s polymorphism. Assume we have
only the following rules:
~INT ~INT + -> ~INT
~REAL ~REAL + -> ~REAL
~INT -> ~REAL
The phrase:
1 + 2
which gives rise to the datatype phrase:
~INT ~INT +
can be interpreted as a REAL via ~two means.
Consider first figure 25.6(a). Each of the two INTs is independently
turned into a REAL, and then the rule:
~REAL ~REAL + -> ~REAL
applies to give an overall REAL interpretation. This is called the
~two-coercion interpretation, because a coercion is applied to each
INT.
Another scenario also occurs. "+" is polymorphic because it will
combine INTs or it will combine REALs. Our second scenario, shown in
figure 25.6(b), has "+" combining the INTs to produce an INT. That
~one combined INT is then turned into a REAL. This is the ~one-
coercion interpretation.
Both these scenarios happen at once, giving rise to figure 25.7. The
full-spanning REAL has two meanings as just discussed. (Figure 28.2
may illustrate these two meanings more clearly).
Later on, when we invoke the semantics of the full-spanning REAL,
both meanings will be considered. A process described in Chapter 28
will choose ultimately the one-coercion solution over the two-
coercion solution.
This concludes our examples of ambiguities within the datatype
language alone.
25.5.2
What Happens To Syntactic Ambiguities?
Recall from Section 15.2 that syntactic ambiguities manifest
themselves as ambiguous semantics. For example, the syntactically
ambiguous phrase:
A + B # C
delivers ambiguous semantics, as shown in figure 25.8(a).
Consider the leftmost "+" block in the figure. It is the "+" in "A+B".
As figure 25.8(b) shows, that semantic block generates the reverse
polish phrase:
A B +
where A and B (shown surrounded by circles) denote the ~types of the
variables A and B.
Figure 25.8(b) shows with each semantic block the
reverse polish phrase it generates. For example, the "(A+B)#C" block
generates:
A B + C #
by invoking its sub-block (A+B)'s semantics, which generate the
"A B +". It then invokes its second sub-block (C), which generates
the second part of the phrase, the C. Finally, it appends the "#"
to complete the reverse polish phrase.
Similarly, the other choice, "A+(B#C)", generates the phrase:
A B C # +
~Our ~syntactic ~ambiguity ~gives ~rise ~to ~the ~ambiguous ~datatype
~phrase:
A B + C # ~or A B C # +
Let's go on to see how this ambiguous phrase parses in the datatype
grammar.
25.5.3
Parsing By The Datatype Grammar
Figure 25.9(a) shows the ambiguous phrase if A, B, and C are all of
type REAL. Part (b) shows what is generated as a by-product of the
grammar.
The datatype rules relevant to this example are:
1) ~REAL ~REAL + -> ~REAL
2) ~POINT ~POINT + -> ~POINT
3) ~REAL ~REAL # -> ~POINT
("+" works on POINTs as well as REALs, and "#" combines two REALs to
form a POINT).
The two phrases making up the ambiguous phrase may or may not
individually parse successfully. For example, suppose the types of A,
B, and C are all REALs. The two phrases appear as:
~REAL ~REAL + ~REAL #
and
~REAL ~REAL ~REAL # +
The first phrase reduces to the following, by rule #1:
~REAL ~REAL #
Rule #3 reduces this successfully to a full-spanning:
~POINT
In contrast, our second phrase can be partially parsed, via rule #3,
to:
~REAL ~POINT +
This final form can be reduced no further. No full-spanning type can
be acquired from this phrase with our rules (barring other coercions).
We've just seen how a syntax ambiguity generates multiple datatype
phrases, some of which might die, being unable to be rendered as a
full-spanning type. Neither, either, or both phrases may succeed.
We've seen one phrase succeed and the other fail.
BOX: If A were of type POINT, what phrase(s) would
BOX: survive?
If the types of A, B, and C were as follows:
A is POINT
B and C are REALs
then the ambiguous phrase would be:
~POINT ~REAL + ~REAL #
~or
~POINT ~REAL ~REAL # +
The latter phrase would be the only survivor (as the "REAL REAL #" goes
to POINT).
If A, B, and C were all POINTs, neither phrase would
survive. If A were itself ambiguous, either a REAL or a POINT, while
B and C were still REALs, ~both phrases would survive.
25.5.4
Under Syntax, What Does The Semantic ~OR Block Do?
In Section 15.2 we characterized the action performed by the ~OR block
as being simply unspecified, a function by the name of F. We are now
in a position to choose a definition for F.
We want the ~OR block to generate an ambiguous phrase, the ambiguous
combination of the phrases generated by each of OR's two components.
We've seen already how to generate ambiguous phrases: Just generate
one phrase and then generate the other phrase. This is two
STATEMENTs executed sequentially, with no dash connecting them. Thus,
we define F via:
DEFINE F( A,B: BASIC_PROCESS ):
<*A*>;
<*B*>;
ENDDEFN
F invokes each of its given components, A, and B, so that each
contributes a phrase, where both phrases share the same span (LEFT and
C).
In figure 25.8(b), the OR-block invokes each of its constituents, and
thus generates the ambiguous phrase:
A B + C # ~or A B C # +
Figure 25.9 shows the generated phrase, if A, B, and C are REALs.
Thus, both syntactic and datatype ambiguities come together in one
formalism. Syntactic ambiguities generate ambiguous datatype phrases,
as also does the datatype grammar on its own. All ambiguities
ultimately reside in the datatype grammar.
BOX: How can ambiguities occur by the datatype grammar
BOX: alone?
BOX:
BOX: How does a syntactic ambiguity manifest itself in the
BOX: datatype language?
BOX:
BOX: What does the OR-block do during the execution of
BOX: syntax's semantics?
25.6
Picking Up The Generated Phrases As A CHOICE_OF_PHRASES
We generate phrases via STATEMENTs, as just shown. Once phrases are
generated, C holds all the phrases as a single CHOICE_OF_PHRASES.
We introduce two notations that turn ~any STATEMENT into a
CHOICE_OF_PHRASES. One of them is:
[-> STATEMENT <-] -> EXPR
(a CHOICE_OF_PHRASES)
This basically executes the STATEMENT and then yields the value
(the generated phrases in C).
This notation translates simply to:
HOLDING C:= NIL;
GIVE
DO ~the_STATEMENT
GIVE C
ENDHOLD
The STATEMENT is executed in a context where C is set to NIL initially.
All phrases generated, which wind up on C, is the value delivered by
this EXPR. (If the STATEMENT performs no phrase generation, then this
EXPR yields NIL).
This new EXPR depends on the variable LEFT. Whatever happens to be
in LEFT upon commencement of this EXPR's execution, will be the left
endpoint for all full-spanning phrases.
Our second notation will be the one we use here almost exclusively:
[> STATEMENT <] -> EXPR
This executes the STATEMENT, and yields ~almost all of
C as its result. It yields only the ~full-spanning phrases in C that
are of ~length ~one. Its translation follows:
HOLDING C:= NIL;
LEFT:= NIL;
GIVE
DO ~the_STATEMENT
GIVE C pruned so as to hold full-spanning unit-
length phrases only
ENDHOLD
Example:
The STATEMENT:
<INT>
generates a CHOICE_OF_PHRASES (on C) that includes at least an
<INT>. If the coercion from INT to REAL is a rule in our
datatype grammar, then C will also contain a <REAL>, sharing
the same span as the <INT>.
Let's enclose this STATEMENT within the "[>...<]":
[> <INT> <]
This EXPR is of type CHOICE_OF_PHRASES, and the value of this
EXPR is the CHOICE_OF_PHRASES that arises on C upon generating
the phrase <INT>. That is, this EXPR is a CHOICE_OF_PHRASES
containing both <INT> and <REAL>.
Example:
While the STATEMENT:
<A>
<B>
<C> - <D>
<E>
~generates the ambiguous phrase in figure 25.3, the EXPR:
[-> <A>
<B>
<C> - <D>
<E> <-]
delivers that generated phrase as a CHOICE_OF_PHRASES, with no
mention made of our variable C.
Our more popular operation with "[>" instead of "[->":
[> <A>
<B>
<C> - <D>
<E> <]
delivers only the full-spanning phrases A, B, and E.
(Conceivably, the "<C>-<D>" phrase, with a grammar active,
could produce other full-spanning phrases, which would then
be included).
Example:
Given the rules:
~INT ~INT plus -> ~INT
~REAL ~REAL plus -> ~REAL
~INT -> ~REAL
the CHOICE_OF_PHRASES delivered by:
[-> <INT> - <INT> - <plus> <-]
is shown in figure 25.7. The one delivered by:
[> <INT> - <INT> - <plus> <]
consists of only two phrases, the full-spanning INT and REAL.
BOX: BOX: What notations do we have that collect up the phrases
BOX: generated by the execution of a STATEMENT?
BOX:
BOX: The expression:
BOX:
BOX: [> <INT> <]
BOX:
BOX: represents what CHOICE_OF_PHRASES, assuming we have
BOX: the INT-to-REAL coercion?
BOX:
BOX: Because:
BOX:
BOX: <INT>
BOX:
BOX: is a STATEMENT, what effect does it have?
25.7
The WANT Quantifier
We now address the last notation required in order to present
a language involving two domains, in our case, syntax and
datatypes.
Suppose we have a CHOICE_OF_PHRASES and we want to find, say, a full-
spanning BOOLean. This need may arise in the implementation of the
syntax rule:
while EXPR ; -> QUANTIFIER
For this quantifier to be meaningful, the EXPR must be of type BOOL.
We can invoke that EXPR's semantics so as to generate all possible
types (phrases) that the EXPR could be interpreted as. Then, using
the notation "[>...<]", we obtain a CHOICE_OF_PHRASES that has all the
full-spanning types of that EXPR. We now are interested in that
full-spanning BOOL within the CHOICE_OF_PHRASES.
Suppose X is the CHOICE_OF_PHRASES. The ~new quantifier:
WANT <BOOL:B> FROM X ;
causes an iteration for each full-spanning BOOL in X. On each of
those iterations, it sets B to the semantics of the matched BOOL.
(According to Section 14.3, there can be at most one full-
spanning BOOL, as duplicate phrases are supressed. This
instance of the WANT quantifier thus causes always zero or one
iterations. It causes zero iterations if there is no full-
spanning BOOL, and one iteration otherwise).
This WANT quantifier translates into our more familiar FOR-quantifier:
FOR P $E X; WITH P.POS = BOOL &
-DEFINED( P.LEFT ) ;
EACH_DO b:= P.SEM ; ;
It looks at all phrase blocks (P) in the CHOICE_OF_PHRASES X, and
causes an iteration for each phrase block whose part-of-speech is BOOL,
and which is full-spanning (the "-defined(P.LEFT)"). It also sets the
matched phrase block's semantics into the variable B (the EACH_DO).
(The "P.SEM" here may be transformed to "<*P.SEM*>" if context
demands that).
The WANT quantifier is supported by the following syntax rule:
want TOKEN from EXPR ; -> QUANTIFER
The TOKEN is just like that admitted on the lefthand side of a rule.
The EXPR must be of type CHOICE_OF_PHRASES.
25.7.1
WANT Quantifier Can Involve Array Parts-Of-Speech
Recall that a TOKEN can be of the form:
< ID [ ID ] : ID >
as in:
<EXPR[I]:X> or <TYPES[T1]:X>
That is, for matching array parts-of-speech, the index variable (the ID
enclosed in "[...]") is also set upon each match.
For example, let's declare an array part-of-speech we will be using for
datatype processing:
POS TYPES[10000] : - ;
We will agree that any datatype is an instance of this TYPES array of
parts-of-speech. Therefore, the quantifier:
WANT <TYPES[T]:S> FROM X ;
may cause more than one iteration. Any part-of-speech that is a
member of the TYPES array of parts-of-speech will be matched, and T
will be set to its index, a different index for each match. We will
see examples of this use of the WANT quantifier in the next chapter.
25.7.2
WANT Can Match A Phrase
In general, the WANT quantifier can specify not only one TOKEN to
match, but a left-to-right sequence of TOKENs to be matched:
want TOKEN TOKEN ... TOKEN from EXPR ; -> QUANTIFIER
For example, consider:
WANT <REAL:a> <INT:b> <plus> FROM X ;
This will cause an iteration for each full-spanning occurence of the
phrase:
<REAL> <INT> <plus>
There will be one match if X is figure 25.7.
25.7.3
WANT Can Recognize Non-Full-Spanning Phrases
Finally, the WANT quantifier can be used even to find non-full-spanning
phrases. The left side of the phrase to match may be augmented via
the syntax:
ID --
as in:
WANT L-- <REAL:a> <INT:b> <plus> FROM X ;
Now all occurences of the phrase:
<REAL> <INT> <plus>
emenating from X will be matched. Upon each match, the new variable L
is set to the left neightbor of the matched phrase.
Now that we can capture the left neighbor of a matched phrase, we can
note an equivalence. For example, the quantifier:
WANT <REAL:a> <INT:b> <plus> FROM X ;
can be written equivalently as:
WANT L1-- <plus> FROM X ;
!!
WANT L2-- <INT:b> FROM L1 ;
!!
WANT <plus> FROM L2 ;
Each of these three quantifiers matches one token of the original
match phrase. Each token is not required to be full-spanning, as the
notation:
ID --
occurs in the first two quantifiers. Notice how the next quantifier
reads the left neighbor set by the previous quantifer (e.g., L1 and
L2).
BOX: Of what use is the WANT quantifier?
BOX:
BOX: How do you specify a WANT quantifier that matches
BOX: phrases of lengths greater than one?
BOX:
BOX: How do you get WANT to match non-full-spanning phrases?