|
|
|
|
|
10. TOKENS
A program consists of a sequence of tokens which may be delimited by white space. There are two types of tokens:
identifiers
literals
Syntax:
program ::=
{ white_space | token } .
token ::=
identifier | literal .
10.1 White space
There are three types of white space
spaces
comments
line comments
White space always terminates a preceding token. Some white space is required to separate otherwise adjacent tokens. Syntax:
white_space ::=
( space | comment | line_comment )
{ space | comment | line_comment } .
10.1.1 Spaces
There are several types of space characters which are ignored except as they separate tokens:
blanks, horizontal tabs, carriage returns and new lines.
Syntax:
space ::=
' ' | TAB | CR | NL .
10.1.2 Comments
Comments are introduced with the characters (* and are terminated with the characters *) . For example:
(* This is a comment *)
Comment nesting is allowed so it is possible to comment out larger sections of the program which can also include comments. Comments cannot occur within string or character literals. Syntax:
comment ::=
'(*' { any_character } '*)' .
10.1.3 Line comments
Line comments are introduced with the character # and are
terminated with the end of the line.
# This is a comment
Comments cannot occur within string, character or numerical literals. Syntax:
line_comment ::=
'#' { any_character } NL .
10.2 Identifiers
There are three types of identifiers
name identifiers
special identifiers
parenthesis
Identifiers can be written adjacent except that between two name identifiers and between two special identifiers white space must be used to separate them. Syntax:
identifier ::=
name_identifier | special_identifier | parenthesis .
10.2.1 Name identifiers
A name identifier is a sequence of letters, digits and underscores ( _ ). The first character must be a letter or an underscore. Examples of name identifiers are:
NUMBER integer const if UPPER_LIMIT LowerLimit x5 _end
Upper and lower case letters are different. Name identifiers may have any length and all characters are significant. The name identifier is terminated with a character which is neither a letter (or _ ) nor a digit. The terminating character is not part of the name identifier. Syntax:
name_identifier ::=
( letter | underscore ) { letter | digit | underscore } .
letter ::=
upper_case_letter | lower_case_letter .
upper_case_letter ::=
'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' |
'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' |
'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' .
lower_case_letter ::=
'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' |
'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' |
'u' | 'v' | 'w' | 'x' | 'y' | 'z' .
digit ::=
'0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' .
underscore ::=
'_' .
10.2.2 Special identifiers
A special identifier is a sequence of special characters. Examples of special identifiers are:
+ := <= * -> , &
Here is a list of all special characters:
! $ % & * + , - . / : ; < = > ? @ \ ^ ` | ~
Special identifiers may have any length and all characters are significant. The special identifier is terminated with a character which is not a special character. The terminating character is not part of the special identifier. Syntax:
special_identifier ::=
special_character { special_character } .
special_character ::=
'!' | '$' | '%' | '&' | '*' | '+' | ',' | '-' | '.' | '/' |
':' | ';' | '<' | '=' | '>' | '?' | '@' | '\' | '^' | '`' |
'|' | '~' .
10.2.3 Parentheses
A parenthesis is one of the following characters:
( ) [ ] { }
Note that a parenthesis consists of only one character. Except for the character sequence (* (which introduces a comment) a parenthesis is terminated with the next character. Syntax:
parenthesis ::=
'(' | ')' | '[' | ']' | '{' | '}' .
10.3 Literals
There are three types of literals
integer literals
character literals
string literals
Syntax:
literal ::=
integer_literal | character_literal | string_literal .
10.3.1 Integer literals
An integer literal is a sequence of digits which is taken to be decimal. The sequence of digits may be followed by the letter E or e an optional + sign and a decimal exponent. Based numbers can be specified when the sequence of digits is followed by the # character and a sequence of extended digits. The decimal number in front of the # character specifies the base of the number which follows the # character. As base a number between 2 and 36 is allowed. As extended digits the letters A or a can be used for 10, B or b can be used for 11 and so on to Z or z which can be used as 35. Syntax:
integer_literal ::=
decimal_integer [ exponent | based_integer ] .
decimal_integer ::=
digit { digit } .
exponent ::=
( 'E' | 'e' ) [ '+' ] decimal_integer .
based_integer ::=
'#' extended_digit { extended_digit } .
extended_digit ::=
letter | digit .
10.3.2 String literals
A string literal is a sequence of characters surrounded by double quotes. For example:
"" " " "\"" "'" "\'" "String" "ch=\" " "\n\n"
In order to represent nonprintable characters and certain printable characters the following escape sequences may be used.
audible alert BEL \a backslash (\) \\
backspace BS \b apostrophe (') \'
escape ESC \e double quote (") \"
formfeed FF \f
newline NL (LF) \n control-A \A
carriage return CR \r ...
horizontal tab HT \t control-Z \Z
vertical tab VT \v
Additionally there are the following possibilities:
Syntax:
string_literal ::=
'"' { string_character } '"' .
string_character ::=
printable_character | escape_sequence .
escape_sequence ::=
'\a' | '\b' | '\e' | '\f' | '\n' | '\r' | '\t' | '\v' |
'\\' | '\''' | '\"' | '\' upper_case_letter |
'\' { space } '\' | '\' integer_literal '\' .
10.3.3 Character literals
A character literal is a character enclosed in single quotes. For example:
'a' ' ' '\n' '!' '\\' '2' '"' '\"' '\''
To represent control characters and certain other characters in character literals the same escape sequences as for string literals may be used. Syntax:
character_literal ::=
''' ( printable_character | escape_sequence ) ''' .
escape_sequence ::=
'\a' | '\b' | '\e' | '\f' | '\n' | '\r' | '\t' | '\v' |
'\\' | '\''' | '\"' | '\' upper_case_letter |
'\' { space } '\' | '\' integer_literal '\' .
|
|