Compiler Theory
Day 2, January 24, 2019
Example of using Visual Studio with
a Simple Lexical Analyzer
This exercise will guide you through using Visual Studio 2017 to set up and
run a C program, using a lexical analyzer for decimal fractions as an example.
1.Launch Visual Studio 2017.
2.In the File menu, choose New, and then Project.
3.In the Templates at left, Win32 should be highlighted.
Highlight Win32 Console Application, since the lexical analyzer will
run in a terminal window (also called a console window).Have the
type be Visual C++ is okay, since C programs are also legal C++ programs.
4.Down at the bottom in the Name box, enter a name for the project.This
will be the name used for the executable file.For this example. use the
name Lexer.
5.In the Location box, put a folder name where you want the project files.
You can use the Browse button at the right.
6.If you want the project to be in its own subfolder of the folder in step 5,
check the Create directory for solution and fill in the Solution name,
which is usually the same as the Name earlier.
7.Click OK.
8.You get a Win32 Application Wizard.Click Application Settings at the left.
Under Additional Options check Empty project and uncheck Precompiled
Header and Security Development Lifecycle checks.Click OK.
9.You should now be in your project.In the Solution Explorer window at
left, right-click on Source Files, Add, New item. Then highlight
C++ File (.cpp).In the Name box below, enter lexer.c.Using the c
extension instead of cpp tells Visual Studio it is C language rather
than C++.Click the Add button.
10.You get an edit window for lexer.c.Enter the program listed at the
end of this document.
11.In the Build menu, click Build Solution.If you get error messages,
fix them and Build Solution again.
12.In the Debug menu, click Start Debugging.A console window will appear,
blank, and you can type in text for the program to analyze.Note that your
program does not get any of the text until you hit ENTER at the end of a line.
13.You can end the program by closing its window, or typing CTRL-Z ENTER.
ASSIGNMENT:
Modify the sample lexical analyzer to recognize the tokens of ATTO-C.You should
not record the lexeme of a comment, since a comment can be arbitrarily long.You
do have to record the lexeme of a string, since your compiler needs to remember
what the string was, but since a string cannot have newlines in it, you can make
the lexeme long enough to hold any reasonable string.Decimal fractions are NOT
tokens in ATTO-C, so the decimal fraction stuff should be removed.
OVER
/* lexer.c
A simple lexical analyzer for integers and decimal fractions.
Author: Ken Brakke
Date: Jan. 26, 2017
A Finite State machine is used to recognize strings consisting of digits
followed by a decimal point followed by digits.There must be at least
one digit before the decimal point.The decimal point and digits after
the decimal point are optional.Characters not part of legal tokens
will cause a REJECT message, and the lexer will move on to the next
token.
Note: The FINAL state here does not refer to any particular node of the
FSM.It is used after each true final state to signal the lexical
analyzer to start a new token.
Usage: Launch.A console window will appear.You may type in strings
followed by ENTER, and they will either be accepted by the Finite State
Machine and print ACCEPT, or not and print REJECT.The token type and
lexeme (the characters of the token) will also be printed.Multiple
numbers may be entered on one line.To exit the program, hit CTRL-Z and
ENTER.
*/
#include
#include
#include
// Finite State Machine states
#define START 1
#define INTEGER 2
#define DEC_FRAC3
#define FINAL 4
// Token types
#define INTEGER_TOK101
#define DECIMAL_FRACTION_TOK 102
// Size of the lexeme buffer
#define LEX_SIZE 100
// Special look-ahead character value to indicate none
#define NO_CHAR 0
int main()
{ int state; // The current state of the FSM.
int next_char; // The next character of input.
char lexeme[LEX_SIZE];// The characters of the token.
int lex_spot; // Current spot in lexeme.
int token_type;// The type of token found.
// Infinite loop, doing one token at a time.
next_char = NO_CHAR;// no lookahead character to start with
while ( 1 )
{// Initialize the Finite State Machine.
state = START;
lex_spot = 0;
// Loop over characters of the token.
while ( state != FINAL )
{ if ( next_char == NO_CHAR )
next_char = getc(stdin);// get one character from standard input
if ( next_char == EOF )// EOF is special character for End-Of-File
exit(0); // exit the program with exit code 0, which is success.
switch ( state )
{ case START:
if ( next_char ==
)// just eat the newline and stay in START
next_char = 0;
else if ( isdigit(next_char) )
{ state = INTEGER;
lexeme[lex_spot++] = next_char;// Add the character to the lexeme
next_char = NO_CHAR;// eat the character
}
else
{ printf(REJECT %c
,next_char);// This is not a legal final state
state = FINAL;// but we want to end the token anyway
next_char = NO_CHAR; // eat the offending character
}
break;// Need break at the end of a case, else you will continue
// to the next case.
case INTEGER:
if ( isdigit(next_char) )
{ state = INTEGER;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else if ( next_char == . )
{ state = DEC_FRAC;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else
{ lexeme[lex_spot] = 0; // null for end of string
token_type = INTEGER_TOK;
printf(ACCEPT INTEGER %s
,lexeme);// This is a final state
state = FINAL; // leave next_char alone, for next token
}
break;
case DEC_FRAC:
if ( isdigit(next_char) )
{ state = DEC_FRAC;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else
{ lexeme[lex_spot] = 0; // null for end of string
token_type = DECIMAL_FRACTION_TOK;
printf(ACCEPT DECIMAL_FRACTION %s
,lexeme);// This is a final state
state = FINAL; // leave next_char alone, for next token
}
break;
default:
printf(INTERNAL ERROR: Illegal state %d
,state);
state = FINAL;
break;
} // end of switch
} // end of while state
}// end of infinite loop
return 0;// successful exit code
} // end of main
Reference:
Beginners C syntax:
https://www.tutorialspoint.com/cprogramming/c_program_structure.htm
https://www.tutorialspoint.com/cprogramming/c_basic_syntax.htm
etc.
Reviews
There are no reviews yet.