CS 3EA3: Example Haskell Code for Recursive Descent Parsing
Wolfram Kahl September 17, 2009
The module Data.Char provides us with the functions isDigit, isLetter, and ord :: Char Int. module ExprParse where
import Data.Char
An extremely simple expression datatype (abstract syntax):
data Expr
= Var String
| Num Integer
| Add Expr Expr | Mul Expr Expr
For the concrete syntax, we encode the different precedence levels into separate nonterminals to achieve a context-free grammar that allows recursive descent parsing:
Expr::= Term + ExprTerm::= Factor * TermFactor::= Number
| Term| Factor| Identifier |( Expr )
The precedence levels can easil be accommodated by the unparsing function1; whether parentheses are added is decided depending on the precedence argument, which stands for the precedence of the surrounding constructor. Adding parentheses uses an auxiliary function paren:
paren :: String String paren s = ( : s + )
showPrecExpr :: Int Expr String
showPrecExpr p (Var s) = s
showPrecExpr p (Num k) = show k
showPrecExpr p (Add e1 e2) = (if p > 4 then paren else id)
(showPrecExpr 4 e1 + + + showPrecExpr 4 e2 )
1Idiomatic Haskell would define the more efficient showsPrecExpr :: Int Expr String String.
1
showPrecExpr p (Mul e1 e2) = (if p > 5 then paren else id) (showPrecExpr 5 e1 + * + showPrecExpr 5 e2 )
For installing this as standard show function for Expr values, we choose lowest precedence by default:
instance Show Expr where show = showPrecExpr 0
For recursive descent parsing see also http://en.wikipedia.org/wiki/Recursive_descent; for an extremely low-tech Haskell version we use a parser type that takes a String as argument, and returns a pair consisting of a parsed Expr, and the not-yet-parsed rest of the String:
parseExpr , parseTerm , parseFactor :: String (Expr , String )
This is a very simple choice; it forces us to use Haskell run-time errors for reacting to parse errors.
The grammar rules can now straight-forwardly be translated into Haskell, always continuing to work on the rest strings:
parseExpr cs = let (t,cs) = parseTerm cs in case cs of + : cs let (e, cs) = parseExpr cs in (Add t e, cs)
(t,cs)
parseTerm cs = let (f , cs) = parseFactor cs in case cs of *:cs let(t,cs)=parseTerm cs in(Mul f t,cs)
(f , cs)
parseFactor (( : cs) = let (e, cs) = parseExpr cs in case cs of ):cs (e,cs)
error closing parenthesis expected parseFactor (c : cs)
| isLetter c = let (ident,cs) = parseIdentRest [c] cs in (Var ident,cs)
| isDigit c = let (num, cs) = parseNumberRest (numFromDigit c) cs in (Num num, cs)
parseFactor cs = error (factor expected; + show cs + found)
Since we allow multi-letter dentifiers and multi-digit numbers, we use auxiliary parser to complete
them after we recognised the first character:
parseIdentRest :: String String (String , String ) parseIdentRest s (c : cs ) | isLetter c = parseIdentRest (s + [ c ]) cs parseIdentRest s cs = (s,cs)
parseNumberRest :: Integer String (Integer , String ) parseNumberRest n (c : cs)
| isDigit c = parseNumberRest (10 n + numFromDigit c) cs parseNumberRest n cs = (n,cs)
numFromDigit :: Char Integer numFromDigit c = toInteger (ord c ord 0)
2
Reviews
There are no reviews yet.