An algorithm for lexical analysis
Compiler 2019 FallMacau University of Science and TechnologyInstructor: Zhiyao Liang
Introduction
This is simple algorithm to construct a scanner (lexical analyzer) that is built on using DFA.
1. Input
A sequence of characters, which can be created by loading the content of a (source program) file and save as a long string in memory.
2. Computation
2.1 Define tokens
A token is pair:
Token type
Token string
Using C, a token can be implemented as a structure (struct).
2.1.1 Token types
Define a different token type for
each operator, like + * /.
each sign and mark ; ( ) { }.
Identity, like variable name, function name, array name.
Number, like 99 -98.
Other kind of literal constant: like a string.
Each keyword, like while if else return
Some special tokens designed for ease of computation, such as EndOfFile , Enter, Error.
2.2 Define linked list of tokens.
A list is a sequence of list node. A list node contains the following:
Data. For scanner, it can be a token (or the address of a token)
A link to the next node, which can be the address of the next node in the list.
(Optionally) A link to the previous node, which can be the address of the previous node in the list. With this field, the list is called a doubly-linked list, otherwise, it is a singly-linked list.
Define a DFA for the scanner
Each call of the DFA is loop, as follows:
/* preparation */
The currentState of the DFA is the unique start state.
An input string str is provided
A currentPos is provided, which is a certain index in str.
A tokenString which is initially an empty string.
/* loop */Repeat:
If str is empty, quit and do nothing.
Read the character c at currentPos in str.
Move currentPos to the next index in str.
if it is necessary (like for ID and Number), append c to the tokenString.
According to c and currentState, change currentState to some new state.
Considering the updated currentState
If it is a (accept) final state
Some token t is generated. Associate with t
the corresponding token type,
and the token string, which is either empty of the recorded tokenString.
If necessary, move currentPos back to the previous position, since that last character read could belong to the next token.
The computation of the DFA finishes.
if it is an Error state (could be a special final state).
Generate some Error token.
or, set some status variable for the error.
The computation of the DFA finishes.
The transition rules of the DFA should be defined according to the specific source language.
The algorithm sketch
/* preparation */Let str be a sequence of charactersLet currentPos be an index of str, initially as 0.
/* loop */When currentPos is not the end of str, repeat:
Call the (already defined) DFA.
Upon the result of the call of the DFA
If a token is generated, put the token in a list node and append the node to the list.
If some error reported by the DFA, which can be signaled by some special Error token is generated by the DFA, or by some other special variable recording the status, print some error message, and optionally terminate the whole computation process.
Note that currentPos should automatically updated by calling the DFA.
3. Output and Result
When the source program is correct:A list of tokens should be generated. To test it, the list of tokens can be printed.
When some lexical error occurs in the source program:
Some (at least one) error message should be printed showing the corresponding explanations.
Optionally, a partial token list can be generated, which can be printed for testing.
Programming
[SOLVED] algorithm compiler An algorithm for lexical analysis
$25
File Name: algorithm_compiler_An_algorithm_for_lexical_analysis.zip
File Size: 489.84 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.