CS 441 Pascal Lexeme Rules
(corrected after Homework 1 was complete)
1. An Identifier (or Reserved Word)
a. Starts with an alpha (a-z or A-Z)
b. The remainder may be alpha or digit (0-9) or underscore
c. Delimited by any character not alpha or digit
2. A recognized Symbol
+-*/=<>[] .,:;^()
<> <= >= := ..
3. An Integer Literal
a. 1 or more digits with no decimal point
b. Delimited by any character not a digit
(Two character symbols)
4. A Real Literal
a. 1 or more digits, followed by a decimal point, followed by one or more digits.
b. Delimited by any character not a digit.
c. Error: following digit expected when the character after the decimal point is not a
digit. Ex: 3. 3. 0
5. A String Literal
a. Begins with a single quote and contains all characters up until the next single quote.
b. Inside a character literal, two consecutive single quotes are interpreted as one single
quote. Ex: Shes a nice person is interpreted as Shes a nice person
c. Error: end of character literal expected when an end of line/file is encountered before the ending single quote.
Comments:
Whitespace: consists of any number of consecutive characters from the following list: Space, tab, carriage-return, end-of-line.
Whitespace always delimits lexemes except within a character literal or comment.
Unrecognized Characters: any character not included in one of the above categories.
o Error Unrecognized character. Line and Column numbers should be adjusted to refer
to the unrecognized character.
End of File character: indicates there are no more characters to process.
o beginwith{andendwith}
o are syntactically treated as a blank (whitespace).
o may span multiple lines (may contain end of line characters).
o Error: End of comment expected when the closing } is not found and end of file is
encountered. The column/line numbers should refer to the ( of the beginning (*
Character Categories for a Pascal Compiler
Category
Characters in this Category / Description
CC_EOL
(newline, linefeed) r (carriage return).
CC_EOF
End of File: not a true character ever returned by ifstream f.getc() instead it returns -1: char c = f.getc(); if ((int)c == -1) { is eof }
CC_WHITESPC
t (tab) or (space)
CC_ALPHA
use isalpha(): a..z, A..Z
CC_DIGIT
use isdigit(): 0..9
CC_PERIOD
.
CC_LEFTBRC
{
CC_RIGHTBRC
}
CC_STAR
*
CC_COLON
:
CC_LESSTHAN
<CC_GREATTHAN >
CC_EQUAL
=
CC_QUOTE
CC_UNDERSC
_
CC_SYMBOL
+ / [ ] , ; ^ ( ) (unambig. single-character Pascal symbols)
CC_OTHER
c >= && c < ‘~’ (ASCII Codes 32..125), not including all above. may be used in comments and char literals.CC_INVALID c < ‘ ‘ || c >=~ (well use ~ to easily test for invalid characters)
Except:
r t
Reviews
There are no reviews yet.