[SOLVED] data structure algorithm Purpose

$25

File Name: data_structure_algorithm_Purpose.zip
File Size: 301.44 KB

5/5 - (1 vote)

Purpose
Construct a lexical analyzer, also called a scanner for the specified source language.
The source language
The source language is documented in the file The_language_N.pdf . Some sample programs of
this language are in the folder Sample_N_programs .
Implementing the scanner
Use the algorithm of scanner discussed in class, which is also documented in the file
scanner_algorithm.pdf .
Do not use some automatically generated scanner code.
Design your program with reasonable modules. A possible choice is as follows
tools.h : inlcude the declarations of tools for general purpose.
tools.c : include the definitions of general-purpose tools.
token.h : include token types, and related function prototypes.
token.c : include token related function definitions.
list.h : include declarations related to the list data structure.
list.c : include the function definitions for the list data structure.
automata.h : include the declaration of some DFA.
automata.c : include the definitions for the DFA.
scanner.c : include the main function, the driver of the program.
Expected behavior of the scanner
Test your scanner with the provided sample language N programs as the source.
When a source program is correct,
some token list should be generated, which will be used by the parser in the future.
The generated token list should be printed. For a token if its string cannot be represented by
its type, like ID or Number, its string and type should both be printed; otherwise, only the
token type need to be printed.
When some error occurs, some (at least one) corresponding error message should be printed.
For example, if there is a file sample.n whose content has only two lines:
print(The result is:
) ;
output( (3|7 + 5) * 8 / -9) ;
When the scanner runs with sample.n , the generated token list could be printed as the following:
ID: print
LPAR
STRING: The result is:

RPAR
SEMI
ENTER
ID: output
LPAR
LPAR
NUM: 3|7
PLUS
NUM: 5
RPAR
MULTI
NUM: 8
DIV
NEG
NUM: 9
RPAR
SEMI
Here the scanner dos not need to distinguish negative sign NEG with the minus operator. You can let
the parser to judge the difference later.

Algorithm
This is simple algorithm to construct a scanner (lexical analyzer) that is built on using DFA.
1. Input
A sequence of characters, which can be created by loading the content of a (source program) file and
save as a long string in memory.
2. Computation
2.1 Define tokens
A token is pair:
Token type
Token string
Using C, a token can be implemented as a structure ( struct ).
2.1.1 Token types
Define a different token type for
each operator, like + * / .
each sign and mark ; ( ) { } .
Identity, like variable name, function name, array name.
Number, like 99 -98 .
Other kind of literal constant: like a string.
Each keyword, like while if else return
Some special tokens designed for ease of computation, such as EndOfFile , Enter, Error.
2.2 Define linked list of tokens.
A list is a sequence of list node. A list node contains the following:
Data. For scanner, it can be a token (or the address of a token)
A link to the next node, which can be the address of the next node in the list.
(Optionally) A link to the previous node, which can be the address of the previous node in the list.
With this field, the list is called a doubly-linked list, otherwise, it is a singly-linked list.
Define a DFA for the scanner
Each call of the DFA is loop, as follows:
/* preparation */
The currentState of the DFA is the unique start state.
An input string str is provided
A currentPos is provided, which is a certain index in str.
A tokenString which is initially an empty string.
/* loop */
Repeat:
If str is empty, quit and do nothing.
Read the character c at currentPos in str.
Move currentPos to the next index in str.
if it is necessary (like for ID and Number), append c to the tokenString.
According to c and currentState, change currentState to some new state.
Considering the updated currentState
If it is a (accept) final state
Some token t is generated. Associate with t
the corresponding token type,
and the token string, which is either empty of the recorded tokenString.
If necessary, move currentPos back to the previous position, since that last character
read could belong to the next token.
The computation of the DFA finishes.
if it is an Error state (could be a special final state).
Generate some Error token.
or, set some status variable for the error.
The computation of the DFA finishes.
The transition rules of the DFA should be defined according to the specific source language.
The algorithm sketch
/* preparation */
Let str be a sequence of characters
Let currentPos be an index of str, initially as 0.
/* loop */
When currentPos is not the end of str, repeat:
Call the (already defined) DFA.
Upon the result of the call of the DFA
If a token is generated, put the token in a list node and append the node to the list.
If some error reported by the DFA, which can be signaled by some special Error token is
generated by the DFA, or by some other special variable recording the status, print some
error message, and optionally terminate the whole computation process.
Note that currentPos should automatically updated by calling the DFA.
3. Output and Result
When the source program is correct:
A list of tokens should be generated. To test it, the list of tokens can be printed.
When some lexical error occurs in the source program:
Some (at least one) error message should be printed showing the corresponding
explanations.
Optionally, a partial token list can be generated, which can be printed for testing

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] data structure algorithm Purpose
$25