Crafting Interpreters in Haskell - Parsing Expressions
In this post, we will continue from where we left off on the Scanner. Today, we implement a parser close to the one described in Parsing Expressions
We follow the same approach as earlier. That is, we will continue to use Parsec
and try to stay close to the implementation of the book as possible. The module presented in this post can be found at ExprParser.hs
Let’s first get some import
statements out of the way.
--- Scanner.hs
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NoImplicitPrelude #-}
module ExprParser where
import Data.Text as T
import Import hiding (many, try, (<|>))
import Scanner
import Text.Parsec
Layout
Expression Grammar
As outlined in the book Chapter 6, the expression grammar we want our parser to understand is given by:
expression → literal
| unary
| binary
| grouping ;
literal → NUMBER | STRING | "true" | "false" | "nil" ;
grouping → "(" expression ")" ;
unary → ( "-" | "!" ) expression ;
binary → expression operator expression ;
operator → "==" | "!=" | "<" | "<=" | ">" | ">="
| "+" | "-" | "*" | "/" ;
This grammar will represent the AST we will build after the parser parses the LoxTokInfo
tokens that were created previously in the Scanning
step. Therefore, we need to represent the above grammar in Haskell as follows:
data BinOp = NotEqual | EqualEqual | Gt | Gte | Lt | Lte | Plus | Minus | Star | Slash
deriving (Show, Eq)
data UnaryOp = UnaryMinus | UnaryBang deriving (Show, Eq)
data Expr
= Number Double -- NUMBER
| Literal T.Text -- STRING
| LoxBool Bool -- "true", "false"
| LoxNil -- 'nil'
| Paren Expr -- grouping
| Unary UnaryOp Expr -- unary
| Binary Expr BinOp Expr -- binary
deriving (Show, Eq)
The comments provided besides each data constructor ties them back to the productions in the expression grammar that is presented above. This data Expr
will be the AST that our parser will produce using the same top-down approach we took in the previous chapter.
Making Parsec Work on LoxTokInfo
Similar to scanning
step we want to use the Parsec
library to parse the LoxTokInfo
tokens we created in the lex
step. So far we have used Parsec
to parse either String
or Char
data. Now, we need a way to use Parsec
to scan input which are a list of LoxTokInfo
tokens. We can do that by implementing our own satisfy
version as shown below.
satisfyT :: (LoxTokInfo -> Maybe a) -> Parser a
= tokenPrim showTok updateTokPos match
satisfyT f where
= show $ tokinfo_type ti
showTok ti : _) = tok_position s
updateTokPos _ _ (s = pos
updateTokPos pos _ [] = f t match t
The satisfyT
is implemented over the tokenPrim
parser. The tokenPrim
combinator is the most primitive combinator that all other combinators can be built out of. Since, we want all are combinators to work with LoxTokInfo
, we can build the satisfyT
function over this combinator and then use the satisfyT
function is all other other combinators you will build.
More detail description of we this works can be found here. For now, is is sufficient to understand the tokenPrim
return the Parser a
that satisfies a condition.
Here the a
is generic in satisfyT :: (LoxTokInfo -> Maybe a) -> Parser a
. In most cases the a
will be the Expr
type itself. This will become clear as we look at other combinators.
Implementing Parser
Now, that we have the satisfyT
function, we are ready to build other parser combinators. Let first define a type alias that will represent the parser
each of our combinators will be returning.
type Parser a = ParsecT [LoxTokInfo] () Identity a
In Chapter 6, the shape of recursive-descent parser we want to build is given by the following grammar that is suggested. The grammar suggested takes care of required precedence and associativity relationships.
expression → equality ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary
| primary ;
primary → NUMBER | STRING | "true" | "false" | "nil"
| "(" expression ")" ;
This grammar will guide us through the combinators we have to build. First let’s build some combinators for the primary
productions. You will notice that these combinator uses the satisfyT
function to check the LoxTokInfo
they need to match and accordingly return an Expr
object.
We have number
, literal
, loxBool
, loxNil
and loxParenExpr
all lined up with a choice
operator in loxPrimary
.
As you will notice, each of the Parsers
returned by the combinators, take a LoxTokInfo
as input, and if that token matches (via satisfyT
), then we will return a Expr
.
number :: Parser Expr
= satisfyT f
number where
LoxTokInfo (NUMBER x) _ _ _) = Just (Number x)
f (= Nothing
f _
literal :: Parser Expr
= satisfyT f
literal where
LoxTokInfo (STRING x) _ _ _) = Just (Literal $ T.pack x)
f (= Nothing
f _
loxBool :: Parser Expr
= satisfyT f
loxBool where
LoxTokInfo TRUE _ _ _) = Just (LoxBool True)
f (LoxTokInfo FALSE _ _ _) = Just (LoxBool False)
f (= Nothing
f _
loxNil :: Parser Expr
= satisfyT f
loxNil where
LoxTokInfo NIL _ _ _) = Just LoxNil
f (= Nothing
f _
loxParenExpr :: Parser Expr
= do
loxParenExpr *> loxExpr <* satisfyT parenClose
satisfyT parenOpen where
-- use LoxNil as placeholder, since we do not have an equilivalent Expr for Paren
LoxTokInfo LEFT_PAREN _ _ _) = Just ()
parenOpen (= Nothing
parenOpen _
LoxTokInfo RIGHT_PAREN _ _ _) = Just ()
parenClose (= Nothing
parenClose _
loxPrimary :: Parser Expr
= number <|> literal <|> loxBool <|> loxNil <|> loxParenExpr loxPrimary
Once we have the primitives, it is just a matter off expanding out each production(line) of the grammar into its own function. But, before that you will notice a common pattern across all expressions. For example,we need to match comparison
to one or more of its instances. But, we also want to make sure these are all left-associative
.
To help with that, we have a helper function called leftChain
, which closely simulates the chainl
function in Parsec
. We have to implement our own function, since we want a different type for both p
and op
that will be passed to the leftChain
function. The existing definition of chainl
does not work for us.
-- this is similar to chainl in `Text.Parsec` but works on `BinOp`
-- adopted from https://jakewheat.github.io/intro_to_parsing/
leftChain :: Parser Expr -> Parser BinOp -> Parser Expr
= do
leftChain p op <- p
expr
maybeAddSuffix exprwhere
= do
addSuffix e0 <- op
op' <- p
e1 Binary e0 op' e1)
maybeAddSuffix (
= addSuffix e <|> return e maybeAddSuffix e
Once we have the leftChain
function which just need to add a function for each production. For example, for
unary → ( "!" | "-" ) unary
| primary ;
we have,
unary' :: Parser Expr
= Unary <$> satisfyT f <*> unary
unary' where
LoxTokInfo BANG _ _ _) = Just UnaryBang
f (LoxTokInfo MINUS _ _ _) = Just UnaryMinus
f (= Nothing
f _
unary :: Parser Expr
= unary' <|> loxPrimary unary
We follow the same pattern for factor
, term
and comparison
. You will notice that similar to the grammar, the dependency goes from equality -> comparison -> term -> factor -> unary
. The only difference in each of these functions are the BinOp
we match.
factor :: Parser Expr
= leftChain unary (satisfyT f)
factor where
= case tokinfo_type x of
f x STAR -> Just Star
SLASH -> Just Slash
-> Nothing
_
term :: Parser Expr
= leftChain factor (satisfyT f)
term where
= case tokinfo_type x of
f x MINUS -> Just Minus
PLUS -> Just Plus
-> Nothing
_
comparison :: Parser Expr
= leftChain term (satisfyT f)
comparison where
= case tokinfo_type x of
f x GREATER -> Just Gt
GREATER_EQUAL -> Just Gte
LESS -> Just Lt
LESS_EQUAL -> Just Lte
-> Nothing _
Finally, we can add our helper scanner function:
scannerLoxTokens :: [LoxTokInfo] -> LoxParserResult
= parse loxExpr "" scannerLoxTokens
The concludes the implementation of the Parser
.
Examples
-- term
>>P.parse equality "" $ fromRight [] (scanner "1+2;")
λ Right (Binary (Number 1.0) Plus (Number 2.0))
-- factor
>>P.parse equality "" $ fromRight [] (scanner "1*2/3;")
λ Right (Binary (Binary (Number 1.0) Star (Number 2.0)) Slash (Number 3.0))
-- comparisons
>>P.parse equality "" $ fromRight [] (scanner "1>2<=3;")
λ Right (Binary (Binary (Number 1.0) Gt (Number 2.0)) Lte (Number 3.0))
-- equality
>>P.parse equality "" $ fromRight [] (scanner "1>2==3;")
λ Right (Binary (Binary (Number 1.0) Gt (Number 2.0)) EqualEqual (Number 3.0))
Conclusion
That concludes the implementation of the expression parser. With this we have laid of the groundwork to extend the suite of combinators we have and support more language features in later chapters.
I would like to thank bss03 for providing good ideas here to refine the implemenation of satistyT
.