9. STRUCTURED SYNTAX DEFINITION
Most programming languages have only predefined constructs like
statements and operators. Seed7, on the other hand, additionally
allows user defined constructs. This chapter introduces the Seed7
Structured Syntax Description (S7SSD) which is used to define the
syntax of new constructs. The syntax of predefined constructs is
also defined with S7SSD.
The syntax descriptions used in manuals of conventional
programming languages have no relationship to the approach used by
the syntax analysis of the corresponding interpreters/compilers.
S7SSD is a simple syntax description that can be used by humans
and compilers/interpreters. Although compiler-compilers follow the
path of machine readable syntax descriptions, they use much more
complicated syntax and semantic descriptions and do not allow
users of the language to define new constructs.
There are different existing notations to specify the syntax of
programming languages. Backus-Naur Form (BNF) and its variants
like Extended Backus-Naur Form (EBNF) are examples of such syntax
specifications. Since it is easier to understand new concepts if
they are compared to well known concepts, EBNF will be used as a
base to explain S7SSD.
9.1 The Extended Backus-Naur Form
As the name says the Extended Backus-Naur Form is an extension
of BNF. The extension allows the definition of repetitions and
optional parts without the use of recursion. EBNF has the
following elements:
- Nonterminal symbols are described with identifiers. An
identifier consist of lower case letters and underline
characters.
- Terminal symbols are quoted strings or names in upper case
characters, which describe unprintable characters (control
characters).
- The concatenation of nonterminal and/or terminal symbols is
described by writing them in sequence.
- With | two alternatives can be separated.
- Expressions of the extended Backus-Naur form can be put within
parentheses ( ... ) .
- If an expression is optional it is enclosed in square
brackets [ ... ] .
- If an expression may be omitted or repeated it is
enclosed in curly braces { ... } .
The syntax of the extended Backus-Naur form can be described in
extended Backus-Naur form:
- syntax_description ::=
-
{ ebnf_statement } .
- ebnf_statement ::=
-
identifier '::=' ebnf_expression '.' .
- ebnf_expression ::=
-
term { '|' term } .
- term ::=
-
factor { factor } .
- factor ::=
-
identifier | string | control_character_description |
'(' ebnf_expression ')' | '[' ebnf_expression ']' |
'{' ebnf_expression '}' .
9.2 The Seed7 Structured Syntax Description
The Seed7 Structured Syntax Description is abbreviated with S7SSD.
The S7SSD can describe most but not all of the syntax of a
programming language. The syntax of identifiers, literals and
comments is not described with S7SSD. S7SSD views a program as
a big typeless expression. The syntax of this expression is
described with prefix-, infix- and postfix-operators. The operators
have a priority and an associativity. Operators can have one or
more operator symbols. The operator symbols of an operator can be
adjacent or they can have parameters in between. The S7SSD of
an infix + is:
$ syntax expr: .(). + .() is -> 7;
This defines the + as left associative infix operator with
priority 7. The + operator is an infix operator because the
operator pattern is:
() + ()
The place of the parameters is specified with (). Any expression
can be used as parameter. The type of the parameters and the type
of the result of + is not specified by the S7SSD. Checks for the
correct type are not done at the syntactic level. This way S7SSD
allows syntax that would not be allowed in a corresponding EBNF
description. S7SSD considers just operator symbols and their
priority and associativity.
9.3 The syntax of a statement
To explain the Seed7 Structured Syntax Description we design a
new statement, the loop-statement. The loop-statement should
be similar to while- and repeat-loops but instead of having
the conditional exit at the beginning or at the end, it should
have a conditional exit in the middle of the loop. This middle
conditional exit should be part of the loop-statement. Note that
the break-statement, which exists in some programming languages,
is a statement on its own and is not part of the loop which it
leaves. Therefore the middle conditional exit should not be
confused with a break-statement. An example of the new loop-statement
is:
loop
ch := getc(inFile);
until ch = '\n' do
stri &:= str(ch);
end loop;
The 'loop' example above reads characters from a file and
concatenates them to a string until the character '\n' is read.
The '\n' ends the loop. Hence it is not added to the string.
An equivalent solution without the usage of the loop-statement
would be:
repeat
ch := getc(inFile);
if ch <> '\n' then
stri &:= str(ch);
end if;
until ch = '\n';
The S7SSD of the loop-statement is:
$ syntax expr: .loop.().until.().do.().end.loop is -> 25;
The details of the S7SSD 'syntax' definition will be explained
later. For now we concentrate at the heart of the S7SSD, the
expression:
.loop.().until.().do.().end.loop
For the purpose of the syntax description we can just remove the
dots, which gives:
loop () until () do () end loop
This are the keywords used in a loop-statement. The symbol ()
acts as placeholder for an expression.
With EBNF the loop-statement can be described as:
- loop_statement ::=
-
'loop'
statement
'until' expression 'do'
statement
'end' 'loop' .
An EBNF description may use many nonterminal symbols such as
'statement' or 'expression'. S7SSD does not distinguish
between different nonterminal symbols. Instead S7SSD only
knows one nonterminal symbol: ()
Therefore S7SSD cannot distinguish between 'statement',
'expression' or something else. At the syntax level any kind of
expression can by substituted for a S7SSD nonterminal symbol ().
With EBNF it is possible to describe constraints such as the type
of an expression. S7SSD relies on semantic checks to verify
such constraints. Given the S7SSD of the loop-statement an
expression like
loop
"X"
until 1+2 do
integer
end loop
would be legal as it contains the required keywords
loop until do end loop
and the expressions
"X" 1+2 integer
at the places of the () symbols. This is exactly what the syntax
definition specifies, but it would be not be considered correct
given the description of the loop-statement at the beginning of
the chapter. To determine which types of expressions are allowed
at the places of the () symbol, a semantic definition of the
loop-statement is necessary. A semantic definition is just a
function definition which uses the keywords and parameters from
the syntax definition. The definition of the 'loop' function
(semantic definition of the loop-statement) is:
const proc: loop
(in proc: statements1)
until (ref func boolean: condition) do
(in proc: statements2)
end loop is func
local
var boolean: exitLoop is FALSE;
begin
repeat
statements1;
if not condition then
statements2;
else
exitLoop := TRUE;
end if;
until exitLoop;
end func;
This definition determines the types of the expressions accepted
between the keywords. Besides that the semantic definition of the
loop-statement is just a normal function definition. Note that
the sequence of keywords and parameters in the header of this
function definition is determined by the corresponding syntax
definition.
The parameters 'statements1', 'condition' and 'statements2' are
call-by-name parameters. A call-by-name parameter is a function
without parameters. Function types such as proc or func boolean
are used as type of formal call-by-name parameters. An expression
with the correct type is allowed as actual call-by-name parameter.
This actual parameter expression is not evaluated when the
function is called. Instead the expression is evaluated every time
the formal call-by-name parameter is used. This way 'statements1',
'condition' and 'statements2' are not executed when the 'loop'
function is called. Inside the body of the 'loop' function the
call-by-name parameters are executed at some places.
The 'loop' function uses a repeat- and an if-statement to
implement the desired behavior. If necessary the call-by-name
parameters are executed several times.
For the 'loop' example with the semantic errors (see above) we
would get an error message like:
*** chkloop.sd7(35):51: Match for {loop "X" until {1 + 2 } do integer end loop } failed
9.4 Priority and associativity
If a syntax construct has parameters before the first symbol or
after the last symbol the priority and the associativity of the
construct are significant. Constructs with stronger priority bind
their parameters earlier than constructs with weaker priority. The
priority is described by a natural number (inclusive 0). The
strongest priority is 0. Weaker priorities are described by larger
numbers. What bind means is can be explained with an example:
=
A = B + C * D / \
A +
* priority 6 / \
+ priority 7 B *
= priority 12 / \
C D
The * operator has the strongest priority (6) of all operators
involved. Therefore the * takes its parameters first. Then
the + (with priority 7) and at last the = (with priority 12)
follows. This leads to the the following interpretation of
the expression:
A = (B + (C * D))
The associativity describes, in which order constructs with equal
priority bind their parameters. For example
A - B - C
can be interpreted in two ways:
(A - B) - C or A - (B - C)
The first interpretation is usually preferred by mathematicians
and is described with the associativity -> . Generally four
associativities are possible:
| Associativity | Symbol |
| Binding from left to right | -> |
| Binding from right to left | <- |
| Neither the left nor the right parameter
are allowed to have the same priority | <-> |
| At the left side there is a binding from
left to right and at the right side there
is a binding from right to left | -><- |
The last two possibilities give no legal interpretation in the
subtraction example. The third kind of associativity ( <-> ) is
used by the equal operator ( = ) of Pascal because there an
expression like
A = B = C
is not legal.
There is a second way to describe the associativity. The
associativity describes, if an operand must have a stronger
priority than the priority of the operator. For example:
- 7
A - B - C / \ / \
/ \ <=7 / \ <7
- priority 7 -> / \ / \
- C 7 0
/ \ / \
/ \ <=7 / \ <7
/ \ / \
A B 0 0
The numbers in the nodes of the right tree show the priority of
each sub expression (sub tree). With < and <= the required
condition for the priority of an operand is described. An
interpretation is legal if all this conditions are met. If there
are more than one legal interpretations or no legal
interpretation the expression is illegal.
Table for the possibilities of associativity:
| associativity | The priority of the |
| left operand must be | right operand must be |
| -> | <= | < |
| <- | < | <= |
| <-> | < | < |
| -><- | <= | <= |
| | than that of the operator |
The parameter before the operator symbol is called left operand.
The parameter after the last symbol of a construct is called
right operand. In case of normal operators the last symbol of a
construct and the operator symbol are identical. If this is not
the case there is a third kind of operand. Between the operator
symbol and the last symbol of a construct are the middle operands.
Middle operands can have any priority.
9.5 The syntax of operators
A syntax definition specifies the way a usage of a statement
or operator must be written. For example a call of the
not operator looks like:
not okay
To describe the syntax of the not operator we write:
$ syntax expr: .not.() is <- 13;
This means that a not expression is constructed with the
symbol not followed by a parameter. The place of the
parameter is marked with the () sign. The syntax description
contains no information about the types of the parameters. At
the syntax level a parameter may be anything. With <- the
associativity of the not operator is specified as right
associative. This means that the right operand is allowed to
have the same priority as the operator symbol. So the
expression
not not okay
is legal and means
not (not okay)
If the associativity of the not operator is specified with
-> instead of <- the 'not not' expression above is not legal.
With 13 the priority of the whole not operator is determined.
As convention priorities from 1 to 20 are used by operators
and priority 25 is used by statements. Arithmetic operators
have priorities from 1 to 11 and comparisons have priority 12.
To define the not operator completely there must be also
a semantic definition which is as follows:
const func boolean: not (in boolean: aBool) is func
result
var boolean: negation is TRUE;
begin
if aBool then
negation := FALSE;
end if;
end func;
In the declaration the not operator is written exactly
in the same way it is written when it is called. The syntax
definition is used at both places: declaration and call.
The syntax and semantic declarations define precisely how
the not operator works.
As next example we try an infix operator like the and operator.
A call of the and operator may look like:
okay and not error
To describe the syntax of the and operator we write:
$ syntax expr: .().and.() is -> 14;
This means that an and expression is constructed with the
symbol and surrounded by parameters. The -> defines the
and operator as left associative. This means that an expression
like
A and B and C
is interpreted as
(A and B) and C
With 14 the priority of the whole and operator is determined.
Since priority 14 is weaker than the priority of the not operator
which is 13 the example expression is evaluated as:
okay and (not error)
Note that the expression
okay and not error
makes no sense if the and operator has priority 12 instead of 14.
S7SSD treats everything as operator description. Operators
have priority and associativity. The priority and associativity
determine in which succession S7SSD syntax rules get applied.
To explain priority and associativity we use the basic arithmetic
operations (+,-,*,/). To describe them with EBNF we can write:
- factor ::=
-
number | name .
- expression_5 ::=
-
factor |
( '+' expression_5 ) |
( '-' expression_5 ) .
- expression_6 ::=
-
expression_5 |
( expression_6 '*' expression_7 ) |
( expression_6 '/' expression_7 ) .
- expression_7 ::=
-
expression_6 |
( expression_7 '+' expression_6 ) |
( expression_7 '-' expression_6 ) .
This describes the following things:
- The operators have different priorities:
- Plus and minus signs are executed first
- Multiplication and division are executed second.
- Addition and subtraction are executed last.
- These priorities are exactly what we expect from an
arithmetic expression.
- Additionally we see that ++2 is allowed and interpreted as
+(+(2)) which means that the plus sign is a right-associative
operator.
- We can also see that a*b*c is allowed and interpreted as
(a*b)*c which means that the multiplication is a
left-associative operator.
All this things can also be described with S7SSD:
$ syntax expr: . + .() is <- 5;
$ syntax expr: . - .() is <- 5;
$ syntax expr: .(). * .() is -> 6;
$ syntax expr: .(). / .() is -> 6;
$ syntax expr: .(). + .() is -> 7;
$ syntax expr: .(). - .() is -> 7;
As we can see S7SSD is shorter as the description with EBNF.
A syntax statement is explained as follows:
- The $ is used to introduce all hard coded statements.
- The keyword 'syntax' introduces a structured syntax
description.
- The result of the recognized expression will have
the type expr. The type expr is used between the syntax
and the semantic analysis. The type expr describes
expressions which are syntactically analyzed but not
semantically analyzed. After the semantic analysis (and
during the runtime) the type expr is not used.
- The colon ':' is used as separator between type and syntax
description.
- A dot expression like '.(). * .()' is introduced (as can
probably be guessed by the name) with a dot. For the purpose
of the syntax description we can just remove the dots in
our mind: '() * ()'
- The symbol 'is' is used in all Seed7 declarations as
separator between the name and the value.
- The associativity is described with one of the symbols
-> (left-associative), <- (right-associative),
<-> (not associative) and -><- (both associativities).
If there are no left or right operands, as it is the case
for the loop-statement, the associativity is irrelevant.
- Finally the priority of the syntax construct is defined with
a integer literal like '6'. The priority '6' is used for the
operators *, /, div, rem, mdiv and mod.
9.6 Syntax of predefined statements
Predefined statements can also be defined with S7SSD.
E.g.: The while-statement. A use of the while-statement is:
while element_index > 0 and okay do
processElement;
write(".");
end while;
To describe the syntax of the while-statement we write:
$ syntax expr: .while.().do.().end.while is -> 25;
This means that the while-statement is an expression with the
symbols 'while', 'do', 'end' and 'while'. With -> the
associativity of the while-statement is specified as left
associative. The associativity has no meaning for the
while-statement since there is no parameter before the first
symbol or after the last symbol. The priority of the whole
while-statement is 25.
The semantic definition of the while-statement is as follows:
const proc: while (ref func boolean: condition) do
(ref proc: statement) end while is func
begin
if condition then
statement;
while condition do
statement;
end while;
end if;
end func;
The syntax definition is used for the declaration and for the call.
This declaration defines precisely how the while-statement
works. It is based on the if-statement and uses recursion
to emulate the repetition of the loop body.
Another example for a syntax description is the
repeat-statement
repeat
processElement;
write(".");
until element_index = 0 or not okay;
which has the following syntax description:
$ syntax expr: .repeat.().until.() is -> 25;
This means that the repeat-statement is an expression with
the symbols 'repeat' and 'until' and a parameter between
'repeat' and 'until' and after 'until'. With 25 the priority
of the whole repeat-statement is determined. With -> the
associativity of the repeat-statement is specified as left
associative. This allows priorities from 0 to 24 for the
parameter after 'until'. Since statements have priority 25
it is not possible to write a statement direct behind 'until'.
A simple if-statement, without
'elsif' part, is the next example.
A usage of this if-statement might be:
if okay then
writeln("okay");
else
writeln("not okay");
end if;
As syntax description we use
$ syntax expr: .if.().then.().end.if is -> 25;
$ syntax expr: .if.().then.().else.().end.if is -> 25;
Note that this description allows if-statements with and
without 'else' parts. As semantic description we use
const proc: if (in boolean: condition) then
(in proc: statement)
end if is func
begin
case condition of
when {TRUE}: statement;
end case;
end func;
const proc: if (in boolean: condition) then
(in proc: statement1)
else
(in proc: statement2)
end if is func
begin
case condition of
when {TRUE}: statement1;
when {FALSE}: statement2;
end case;
end func;
The two forms of the if-statement are based on the
case-statement. A more complex if-statement with 'elsif'
parts can be:
if number < 0 then
write("less");
elsif number = 0 then
write("equal");
else
write("greater");
end if;
How to define the syntax and the semantic for this statement
is described in the next chapter.
9.7 Advanced syntax definitions
If we want to use some special syntax which should be only
allowed at some place we do the following:
- Define the special syntax with S7SSD in a way that does not
contradict with the rest of the syntax definitions.
- Use semantic definitions to make sure that this syntax
construct can only be used at the place desired.
The EBNF of the if-statement with 'elsif' parts is:
- if_statement ::=
-
'if' expression 'then'
statement
{ 'elsif' expression 'then'
statement }
[ 'else'
statement ]
'end' 'if' .
The S7SSD of this if-statement is:
$ syntax expr : .if.().then.().end.if is -> 25;
$ syntax expr : .if.().then.().().end.if is -> 25;
$ syntax expr : .elsif.().then.() is <- 60;
$ syntax expr : .elsif.().then.().() is <- 60;
$ syntax expr : .else.() is <- 60;
Instead of one rule (as EBNF does) the rule is broken into several
S7SSD rules. This is necessary because S7SSD does not support the
[ ] and { } notations. They are not supported for good reasons:
They complicate the parameter lists and they are also not so easy
to implement. On the other hand, the BNF like rules of S7SSD
lead to semantic constructs which are easy to parse and easy to
compile. The broken down S7SSD rules of the if-statement
corresponds to the following EBNF description:
- if_statement ::=
-
'if' expression 'then'
statement
'end' 'if' .
- if_statement ::=
-
'if' expression 'then'
statement
elseif_or_else_part
'end' 'if' .
- elseif_or_else_part ::=
-
'elsif' expression 'then'
statement .
- elseif_or_else_part ::=
-
'elsif' expression 'then'
statement
elseif_or_else_part .
- elseif_or_else_part ::=
-
'else'
statement .
Since S7SSD uses only one nonterminal symbol '()' it is the job
of the semantic level to make sure that only the right nonterminal
symbol can be used. This is done by introducing the type
ELSIF_PROC (which corresponds to the nonterminal symbol
'elseif_or_else_part' of the EBNF) and the type
ELSIF_RESULT (which is the result
of the ELSIF_PROC).
Normally a syntax declaration can be used in many semantic
declarations. E.g.: The syntax of the '+' operator is defined
once and the semantic of the '+' operator is defined for the types
integer, bigInteger, float, complex, ... This possibility
is not needed for the if-statement. For each of the five
S7SSD syntax rules of the if-statement just one
corresponding semantic declaration is done:
const proc: if (in boolean: condition) then
(in proc: statements)
end if is func
begin
case condition of
when {TRUE}: statements;
end case;
end func;
const proc: if (in boolean: condition) then
(in proc: statements)
(in ELSIF_PROC: elsifPart)
end if is func
begin
case condition of
when {TRUE}: statements;
when {FALSE}: elsifPart;
end case;
end func;
const ELSIF_PROC: elsif (in boolean: condition) then
(in proc: statements) is func
begin
case condition of
when {TRUE}: statements;
end case;
end func;
const ELSIF_PROC: elsif (in boolean: condition) then
(in proc: statements)
(in ELSIF_PROC: elsifPart) is func
begin
case condition of
when {TRUE}: statements;
when {FALSE}: elsifPart;
end case;
end func;
const ELSIF_PROC: else
(ref void: voidValue) is ELSIF_EMPTY;
Since no other functions of type 'ELSIF_PROC' are defined only
legal if-statements can be written.
9.8 Comparison of EBNF and S7SSD
In the S7SSD of the loop-statement
$ syntax expr: .loop.().until.().do.().end.loop is -> 25;
are no nonterminal expressions '()' before the first keyword or
after the last keyword. Therefore the associativity does not play
any role. The nonterminal expressions '()' of the loop-statement
are all surrounded by keywords and therefore they can have any
priority. As priority of the 'loop' 25 is chosen just because
most other statements have also priority 25. The assignments
(:= +:= *:= ...) have priority 20 and all operators used in
arithmetic, boolean and string expressions have priorities less
than 20. BTW: The semicolon operator (;) is defined with the
priority 50. Operators with a priority of 0 get their parameters
before operators with priority 1 and so on.
The corresponding EBNF description of the loop-statement would
be:
- expression_25 ::=
-
'loop'
expression_127
'until' expression_127 'do'
expression_127
'end' 'loop' .
We must keep in mind that alternative rules for expression_25 are
also possible and that for every priority level a rule like
- expression_127 ::=
-
expression_126 .
is defined. Additionally the following rules are defined:
- expression_0 ::=
-
token | parentheses_expression |
call_expression | dot_expression .
- token ::=
-
identifier | literal .
- parentheses_expression ::=
-
'(' expression_127 ')' .
- call_expression ::=
-
expression_127 [ '('
[ expression_127 { ',' expression_127 } ]
')' ] .
- dot_expression ::=
-
[ '.' ] call_expression { '.' call_expression } .
The EBNF description can become long if many priority levels
exist, as it is the case in Seed7.
There are some things which are out of the scope of S7SSD.
The syntax of comments, tokens (identifiers and literals) and
expressions (parentheses, function calls and dot expressions)
is hard coded. The hard coded constructs are described in
chapter 10 (Tokens) and chapter 11 (Expressions).
For the reasons mentioned above it is not possible to transform
every EBNF syntax description into S7SSD. Transforming S7SSD
descriptions to EBNF is always possible.
The advantage of S7SSD lies in its simplicity and that a fast
automated syntax recognition algorithm can be easily implemented.
It is exactly the combination of hard coded syntax recognition and
flexible syntax rules that make it successful.
|