Seed7 - The extensible programming language
Seed7 FAQ Manual Screenshots Examples Libraries Algorithms Download Links
Manual Introduction Tutorial Declarations Statements Types Parameters Objects File System Syntax Tokens Expressions OS access Actions Foreign funcs Errors
Tokens White space Spaces Comment Line comment Identifiers Names Special ids Parentheses Literals integer bigInteger float string char
Manual
Tokens
 previous   up   next 

10. TOKENS

A program consists of a sequence of tokens which may be delimited by white space. There are two types of tokens:

identifiers
literals
Syntax:
program ::=
{ white_space | token } .

token ::=
identifier | literal .

10.1 White space

There are three types of white space

spaces
comments
line comments

White space always terminates a preceding identifier, integer, bigInteger or float literal. Some white space is required to separate otherwise adjacent tokens.

Syntax:
white_space ::=
( space | comment | line_comment )
{ space | comment | line_comment } .

10.1.1 Spaces

There are several types of space characters which are ignored except as they separate tokens:

blanks, horizontal tabs, carriage returns and new lines.
Syntax:
space ::=
' ' | TAB | CR | NL .

10.1.2 Comments

Comments are introduced with the characters (* and are terminated with the characters *) . For example:

(* This is a comment *)

Comment nesting is allowed so it is possible to comment out larger sections of the program which can also include comments. Comments cannot occur within string or character literals.

Syntax:
comment ::=
'(*' { any_character } '*)' .

any_character ::=
simple_literal_character | apostrophe | '"' | '\' |
control_character .

control_character ::=
NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL |
BS  | TAB | LF  | VT  | FF  | CR  | SO  | SI  |
DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB |
CAN | EM  | SUB | ESC | FS  | GS  | RS  | US  |
DEL .

10.1.3 Line comments

Line comments are introduced with the character # and are terminated with the end of the line.
For example:

# This is a comment

Comments cannot occur within string, character or numerical literals.

Syntax:
line_comment ::=
'#' { any_character } NL .

10.2 Identifiers

There are three types of identifiers

name identifiers
special identifiers
parenthesis

Identifiers can be written adjacent except that between two name identifiers and between two special identifiers white space must be used to separate them.

Syntax:
identifier ::=
name_identifier | special_identifier | parenthesis .

10.2.1 Name identifiers

A name identifier is a sequence of letters, digits and underscores ( _ ). The first character must be a letter or an underscore. Examples of name identifiers are:

NUMBER  integer  const  if  UPPER_LIMIT  LowerLimit  x5  _end

Upper and lower case letters are different. Name identifiers may have any length and all characters are significant. The name identifier is terminated with a character which is neither a letter (or _ ) nor a digit. The terminating character is not part of the name identifier.

Syntax:
name_identifier ::=
( letter | underscore ) { letter | digit | underscore } .

letter ::=
upper_case_letter | lower_case_letter .

upper_case_letter ::=
'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' |
'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' |
'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' .

lower_case_letter ::=
'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' |
'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' |
'u' | 'v' | 'w' | 'x' | 'y' | 'z' .

digit ::=
'0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' .

underscore ::=
'_' .

10.2.2 Special identifiers

A special identifier is a sequence of special characters. Examples of special identifiers are:

+  :=  <=  *  ->  ,  &

Here is a list of all special characters:

! $ % & * + , - . / : ; < = > ? @ \ ^ ` | ~

Special identifiers may have any length and all characters are significant. The special identifier is terminated with a character which is not a special character. The terminating character is not part of the special identifier.

Syntax:
special_identifier ::=
special_character { special_character } .

special_character ::=
'!' | '$' | '%' | '&' | '*' | '+' | ',' | '-' | '.' | '/' |
':' | ';' | '<' | '=' | '>' | '?' | '@' | '\' | '^' | '`' |
'|' | '~' .

10.2.3 Parentheses

A parenthesis is one of the following characters:

( ) [ ] { }

Note that a parenthesis consists of only one character. Except for the character sequence (* (which introduces a comment) a parenthesis is terminated with the next character.

Syntax:
parenthesis ::=
'(' | ')' | '[' | ']' | '{' | '}' .

10.3 Literals

There are several types of literals

integer literals
biginteger literals
float literals
character literals
string literals
Syntax:
literal ::=
integer_literal | biginteger_literal | float_literal |
character_literal | string_literal .

10.3.1 Integer literals

An integer literal is a sequence of digits which is taken to be decimal. The sequence of digits may be followed by the letter E or e an optional + sign and a decimal exponent. Based numbers can be specified when the sequence of digits is followed by the # character and a sequence of extended digits. The decimal number in front of the # character specifies the base of the number which follows the # character. As base a number between 2 and 36 is allowed. As extended digits the letters A or a can be used for 10, B or b can be used for 11 and so on to Z or z which can be used as 35.

Syntax:
integer_literal ::=
decimal_integer [ exponent | based_integer ] .

decimal_integer ::=
digit { digit } .

exponent ::=
( 'E' | 'e' ) [ '+' ] decimal_integer .

based_integer ::=
'#' extended_digit { extended_digit } .

extended_digit ::=
letter | digit .

10.3.2 BigInteger literals

A bigInteger literal is a sequence of digits followed by the underline character. The sequence of digits is taken to be decimal. Based numbers can be specified when a sequence of digits is followed by the # character, a sequence of extended digits and the underline character. The decimal number in front of the # character specifies the base of the number which follows the # character. As base a number between 2 and 36 is allowed. As extended digits the letters A or a can be used for 10, B or b can be used for 11 and so on to Z or z which can be used as 35.

Syntax:
biginteger_literal ::=
decimal_integer [ based_integer ] '_' .

10.3.3 Float literals

A float literal consists of two decimal integer literals separated by a decimal point. The basic float literal may be followed by the letter E or e an optional + or - sign and a decimal exponent.

Syntax:
float_literal ::=
decimal_integer '.' decimal_integer [ float_exponent ] .

float_exponent ::=
( 'E' | 'e' ) [ '+' | '-' ] decimal_integer .

10.3.4 String literals

A string literal is a sequence of UTF-8 encoded Unicode characters surrounded by double quotes. For example:

""   " "   "\""   "'"   "\'"   "String"   "ch=\" "   "\n\n"

In order to represent non-printable characters and certain printable characters the following escape sequences may be used.

audible alert BEL \a
backspace BS \b
escape ESC \e
formfeed FF \f
newline NL (LF) \n
carriage return CR \r
horizontal tab HT \t
vertical tab VT \v
backslash (\) \\
apostrophe (') \'
double quote (") \"
control-A \A
...
control-Z \Z

Additionally there are the following possibilities:

  • Two backslashes with a sequence of blanks, horizontal tabs, carriage returns and new lines between them are completely ignored. The ignored characters are not part of the string. This can be used to continue a string in the following line. Note that in this case the leading spaces in the new line are not part of the string.
  • A backslash followed by an integer literal and a semicolon is interpreted as character with the specified ordinal number. Note that the integer literal is interpreted decimal unless it is written as based integer.

Strings are implemented with length field and UTF-32 encoding. Strings are not '\0;' terminated and therefore can also contain binary data.

Syntax:
string_literal ::=
'"' { string_literal_element } '"' .

string_literal_element ::=
simple_literal_character | escape_sequence | apostrophe .

simple_literal_character ::=
letter | digit | parenthesis | special_literal_character |
utf8_encoded_character .

special_literal_character ::=
' ' | '!' | '#' | '$' | '%' | '&' | '*' | '+' | ',' | '-' |
'.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '^' |
'_' | '`' | '|' | '~' .

escape_sequence ::=
'\a' | '\b' | '\e' | '\f' | '\n' | '\r' | '\t' | '\v' |
'\\' | '\''' | '\"' | '\' upper_case_letter |
'\' { space } '\' | '\' integer_literal ';' .

apostrophe ::=
''' .

10.3.5 Character literals

A character literal is an UTF-8 encoded Unicode character enclosed in apostrophes. For example:

'a'   ' '   '\n'   '!'   '\\'   '2'   '"'   '\"'   '\''   '\8;'

To represent control characters and certain other characters in character literals the same escape sequences as for string literals may be used.

Syntax:
character_literal ::=
apostrophe char_literal_element apostrophe .

char_literal_element ::=
simple_literal_character | escape_sequence | apostrophe | '"' .


 previous   up   next