In this project, you are asked to write a predictive parser and a type checker for a small language. The parser checks that the input is syntactically correct and the type checker enforces the semantic rules of the language. The semantic rules that your program will enforce relate to declarations and types.The input to your code will be a program and the output will be:syntax error if the input program is not syntactically correcterror messages if there is a declaration error or a type mismatch in the input program information about the types of the symbols declared in the input program if there is no error.In what follows, I describe the language syntax and semantic rules.1. Language Syntax1.1. Grammar Descriptionprogram ! scopescope ! LBRACE scope list RBRACEscope list ! scopescope list ! var declscope list ! stmtscope list ! scope scope listscope list ! var decl scope listscope list ! stmt scope listvar decl ! id list COLON type name SEMICOLONid list ! IDid list ! ID COMMA id listtype name ! REAL j INT j BOOLEAN j STRINGstmt list ! stmtstmt list ! stmt stmt liststmt ! assign stmtstmt ! while stmtassign stmt ! ID EQUAL expr SEMICOLONwhile stmt ! WHILE condition LBRACE stmt list RBRACEwhile stmt ! WHILE condition stmtexpr ! arithmetic operator expr exprexpr ! boolean operator expr exprexpr ! relational operator expr exprexpr ! NOT exprexpr ! primaryarithmetic operator ! PLUS j MINUS j MULT j DIVboolean operator ! AND j OR j XORrelational operator ! GREATER j GTEQ j LESS j NOTEQUAL j LTEQprimary ! ID j NUM j REALNUM j STRING CONSTANT j bool constbool const ! TRUE j FALSEcondition ! LPAREN expr RPARENNotice that expressions are in prefix notation in this language.
1.2. TokensThe tokens used in the grammar description are:LBRACE = {RBRACE = }COLON = :SEMICOLON = ;REAL = REALINT = INTBOOLEAN = BOOLEANSTRING = STRINGWHILE = WHILECOMMA = ,LPAREN = ‘(‘RPAREN = ‘)’EQUAL = =PLUS = ‘+’MULT = *DIV = /AND = ^OR = ‘|’XOR = &NOT = ~GREATER = >GTEQ = >=LESS = <LTEQ = <=NOTEQUAL =TRUE = TRUEFALSE = FALSESTRING_CONSTANT = ” (letter | digit)* ”ID = letter (letter | digit)*NUM = 0 | (pdigit digit*)REALNUM = NUM . digit digit*whereletter = a j b j c j … j z j A j B j C j … j Zdigit = 0 j 1 j 2 j … j 9pdigit = 1 j 2 j 3 j … j 9
2. Language Semantics2.1. Scoping RulesStatic scoping is used. I have already covered static scoping in class, so you should know the semanticsand how references should be resolved. Every scope defines a scope.2.2. TypesThe language has four basic types: INT , REAL , BOOLEAN , and STRING .
2.3. VariablesProgrammers declare variables explicitly in var decl . The names of the declared variables appear inthe id list and their type is the type name .Example Consider the following program written in our language:{x : INT;y : INT;y = x;}This program has two declared variables: x and y .2.4. Declaration and ReferenceAny appearance of a name in the program is either a declaration or a reference. Any appearance ofa name in the id list part of a var decl is a declaration of the name. Any other appearance of aname is considered a reference to that name. Note that the above definitions exclude the built-in typenames, which are predeclared as part of the language definition.Given the following example (the line numbers are not part of the input):01 {02 a : INT;03 b : REAL;04 b = x;05 }We can categorize all appearances of names as declaration or reference:Line 2, the appearance of name a is a declarationLine 3, the appearance of name b is a declarationLine 4, the appearance of name b is a referenceLine 4, the appearance of name x is a reference2.5. Type SystemWe describe which assignments are valid (type compatibility) and how to determine the types of expressions(type inference).2.5.1. Type Compatibility RulesType compatibility rules specify which assignments are valid. The Rules are the following.C1: If the types of the lefthand side of an assignment is INT , BOOLEAN , or STRING , the righthandside of the assignment should have the same type.C2: If the type of the lefthand side of an assignment is REAL , the type of the righthand side of theassignment should be INT or REAL .M1: Assignments that do not satisfy C1 or C2 are not valid. In this case, we say that there is atype mismatch.
2.5.2. Type Inference RulesC3: The types of the operands of an arithmetic operator should be REAL or INTC4: The type of the operands of a boolean operator should be BOOLEAN .C5: If the type of one the operands of a relational operator are not INT or REAL , the typesof the two operands of the relational operator should be the same. In this case, both typescan be STRING or both types can be BOOLEAN.C6: If the type of on operand of a relational operator is INT or REAL , the type of the otheroperand should be INT or REAL . In this case, the two operands can have different types.C7: The type of a condition should be BOOLEAN .I1: If C1 is satisfied, and the type of one of the operands of an arithmetic operator is REAL , thetype of the resulting expression is REAL .I2: For the PLUS , MINUS , and MULT operators, if the types of the two operands are INT , thetype of the resulting expression is INT .I3: For the DIV operator, if the types of the two operands are INT , the type of the resultingexpression is REAL .I4: If the operands of a boolean operator are BOOLEAN , the type of the resulting expression isBOOLEAN .I5: If C5 and C6 are satisfied, the type of a relational operator expr expr is BOOLEAN .I6: The type of NUM constants is INTI7: The type of REALNUM constants is REALI8: The type of bool const constants is BOOLEANI9: The type of STRING CONSTANT constants is STRINGM2: If C3, C4, C5, C6, or C7 are not satisfied, the type of the expressions is ERROR. In this case,we say that there is a type mismatch.
3. ParserYou must write a parser for the grammar, If your parser detects a syntax error in the input, it shouldoutput the following message and exit:Syntax ErrorYou should start coding by writing the parser first and make sure that your parser generates a syntaxerror message if the input program is syntactically incorrect. There will be no partial credit given forTask 1The grammar of the language is not LL(1) i.e. it does not satisfy the conditions for predictive parser withone token lookahead, however, it is still possible to write a recursive descent parser with no backtracking.We can do this by looking ahead at more than one token in some cases and left-factoring some rules.Two rules likestmt list ! stmtstmt list ! stmt stmt listcan be parsed by first parsing stmt and then checking if the next token is the beginning of stmt listor in the FOLLOW of stmt list . In fact the two rules are equivalent tostmt list ! stmt stmt list1stmt list1 ! stmt list jbut you do not need to explicitly left-factor the rules by introducing stmt list1 to parse stmt list .4. Semantic CheckingYour program will check for the following semantic errors and output the correct message when it encountersthat error.4.1. Declaration ErrorsVariable declared more than once (error code 1.1)A variable is declared more than once if appears more than once in the same id list of avar decl or if it appears in two id list of two different var decl .Undeclared variable (error code 1.2)If resolving a variable name that appears in a statement other than a declaration returns no declaration,the variable is undeclared.Variable Declared but not used (error code 1.3)If a variable is declared but is not referenced, we say that the variable is declared but not used.We say that a declaration referenced if some reference to the variable resolves to the declaration.If a declaration is not referenced, we have error code 1.3.Redeclaration or reference as a variable for of a built-in type name If a built-in type nameappears in id list of a variable declaration or appears in a statement other than a declarationstatement, your parser should output syntax error.For each error involving declarations, your type checker should output one line in the following format:
ERROR CODEin which should be replaced with the proper code (see the error codes listed above) andshould be replaced with the name of the variable that caused the error. Since the testcases will have at most one error each, the order in which these error messages are printed does notmatter.Note that there will only be at most one declaration error per test case.4.2. Type MismatchIf there are no declaration errors and any of the type constraints (listed in the Type System sectionabove) is violated in the input program, then the output of your program should be:TYPE MISMATCHWhere is replaced with the line number that the violation occurs andshould be replaced with the label of the violated type constraint (possible values are C1, C2, C3, C4,C5, C6, or C7. (see section on Type System for details of each constraint). Note that you can assumethat anywhere a violation can occur it will be on a single line.If there are declaration errors and type mismatch errors, only the output for the declaration errorsshould be produced.Note that there will only be at most one type mismatch error per test case.4.3. Use of Uninitialized VariablesFor this part, your program should identify variables that are used before they are defined. We saythat a variable is used if it appears in an expression. We say that a variable is defined if it appears onthe lefthand side of an assignment. A variable is used before it is defined if there is an execution pathin which there is a use of the variable that is not preceded by a definition of the variable. Compilerstypically call this ”use of an uninitialized variable”. In general determining use of uninitialized variablesis involved and requires building a control flow graph of the program. Fortunately, for the language ofthis project, determining uninitialized uses is relatively easy. First, I will describe the output format, thenI will give a more detailed description of how determine uninitialized uses.The output format for this task is the followingUNINITIALIZEDUNINITIALIZED…Where is the name of i’th uninitialized variable andis the line number in which the i’th uninitialized reference appears.In what follows, I will explain how to determine uses of uninitialized variables. Consider the programbelow (the line numbers are not part of the input but are added for ease of reference):01 {02 x1, x2, x3, x4 , x5 : INT;03
04 x3 = 2;0506 WHILE ( < x1 10 ) {07 x1 = + x1 1;08 x2 = 1;09 x4 = + x2 1;10 x1 = + x3 1;11 WHILE ( < x1 10 ) {12 x5 = + x1 + x3 x5 ;13 }14 x1 = x5;15 x5 = 1;16 }1718 x2 = + x2 1 ;19 }The use of x1 in the expression < x1 10 is a use of an uninitialized variable. The first timethe expression is evaluated, there is no previous definition (assignment) to x1. Note that laterevaluations of < x1 10 happen after the assignment x1 = + x1 1; in the body of the loop, butuse still counts as uninitialized use because the first use is uninitialized.The use of x2 on line 09 is initialized because it is always preceded by the definition (assignmentto) of x2 on line 08.The use of x3 on line 10 is initialized because it is always preceded by the definition (assignmentto) x3 on line 04.The use of x1 on line 11 is initialized because it is always preceded by the definition (assignmentto) x1 on line 10 (also on line 07).The use of x1 on line 12 is initialized because it is always preceded by the definition (assignmentto) x1 on line 10 (also on line 07).The use of x3 on line 12 is initialized because it is always preceded by the definition (assignmentto) x3 on line 04.The use of x5 on line 12 is not initialized because it is not preceded by a definition (assignment to)x5.The use of x5 on line 14 is not initialized because it is not preceded by a definition (assignmentto) x5. Even though there is a definition (assignment to) of x5 on line 12, we cannot determine (ingeneral) if the body of the loop will be executed, so we assume that the preceding loop did notexecute. Note that this situation is different from the use of x1 on line 12 which is guaranteed tobe preceded by the definition (assignment to) of x1 on line 10.The use of x2 on line 18 is not initialized because it is not preceded by a definition (assignment to)x2. Even though x2 is defined (assigned a value) on line 08, we cannot determine (in general) ifthe body of the loop is executed, so we have to assume that it is not executed.It should be clear from the example, that to determine uninitialized variables, you can treat while stmtas some kind of a scope. If a use is inside a while stmt , then all definitions preceding it in thewhile stmt together with all definitions preceding it in enclosing while stmt should be taken intoconsideration.4.4. No Semantic ErrorsIf there are no declaration errors, no type mismatch errors and no use of uninitialized variables, thenyour program should output for each reference of a programmer declared variable name, in the order7in which the reference appears in the program, the line number in which the reference appears and theline number of the declaration to which the reference of the name resolves. For this part, the formatshould be the following
…Where is the name of a variable and corresponds to i’th name referencein the program. is the line number in which the i’th reference appears andis the line number of the declaration corresponding to the i’th reference.5. More ExamplesExample 1. Given the following example (the line numbers are not part of the input):01 {02 x : INT;03 y : BOOLEAN;04 y = x;05 }The output will be the following:TYPE MISMATCH 4 C1Example 2. Given the following example (the line numbers are not part of the input):01 {02 x : BOOLEAN;03 y : REAL;04 y = x;05 }The output will be the following:TYPE MISMATCH 4 C28Example 3. Given the following example (the line numbers are not part of the input):01 {02 x : BOOLEAN;03 x : BOOLEAN;04 x = x;05 }The output will be the following:ERROR CODE 1.1 xExample 4. Given the following example (the line numbers are not part of the input):01 {02 x : INT;03 y, x : STRING;04 y = 10;05 x = 10;05 }The output will be the following:ERROR CODE 1.1 xExample 5. Given the following example (the line numbers are not part of the input):01 {02 x : INT;03 y : STRING;04 y = “abc”;05 }The output will be the following:ERROR CODE 1.3 x9Example 6. Given the following example (the line numbers are not part of the input):01 {02 x1 , x2 : INT;03 y : INT;04 x1 = y;05 y = + x1 x2;06 }The output will be the following:UNINITIALIZED y 4UNINITIALIZED x2 5Example 7. Given the following example (the line numbers are not part of the input):01 {02 a : INT;03 b : INT;04 a = 1;05 b = 1;06 { a : INT;07 a = 1;08 WHILE ( > a b )09 {10 a = – a b;11 }12 }13 }The output will be the following:a 4 2b 5 3a 7 6a 8 6b 8 3a 10 6a 10 6b 10 3106. Summary of Requirements1. If there is a syntax error, the program should output syntax error and exit.2. If there is no syntax error and there is a declaration error, the program should output the errormessage for the declaration error. You can assume that there can be no more than one declarationerror in the test case. If the program outputs a declaration error, the program should output noother error message. Specifically, it should not output any type mismatch or uninitialized variableerror message.3. If there is no syntax error and no declaration error and there is a type mismatch, the programshould output the error message for the type mismatch. You can assume that there can be no morethan one type mismatch per test case. If the program outputs a type mismatch error message, theprogram should output no other error message. Specifically, it should not output any uninitializedvariable error message.4. If there is no syntax error, no declaration error and no type mismatch error, the program shouldoutput the list of variables used but not initialized in the format specified in Section 4.3.5. If there is no error of any kind, the program should output for each reference, the line number ofthe reference and the line number of the declaration to which the reference resolves as explainedin Section 4.4 above.7. EvaluationYour submission will be graded on passing the automated test cases.The test cases (there will be multiple test cases in each category, each with equal weight) will be brokendown in the following way (out of 100 points):Parsing: 35 pointsErrors involving declarations: 20 pointsType mismatch errors and no semantic error cases: 35 pointsUse of uninitialized variables: 10 pointsThere is no partial credit for the parsing category, you need to pass all test cases in that category to getthe 35 points. All other categories have partial credit.11
Reviews
There are no reviews yet.