Crafting Interpreters in Haskell - Parsing Expressions

2021-09-15

In this post, we will continue from where we left off on the Scanner. Today, we implement a parser close to the one described in Parsing Expressions

We follow the same approach as earlier. That is, we will continue to use Parsec and try to stay close to the implementation of the book as possible. The module presented in this post can be found at ExprParser.hs

Let’s first get some import statements out of the way.

--- Scanner.hs
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NoImplicitPrelude #-}

module ExprParser where

import Data.Text as T
import Import hiding (many, try, (<|>))
import Scanner
import Text.Parsec

Layout

  1. Expression Grammar
  2. Making Parsec Work in Tokens
  3. Examples
  4. Conclusion

Expression Grammar

As outlined in the book Chapter 6, the expression grammar we want our parser to understand is given by:

expression     → literal
               | unary
               | binary
               | grouping ;

literal        → NUMBER | STRING | "true" | "false" | "nil" ;
grouping       → "(" expression ")" ;
unary          → ( "-" | "!" ) expression ;
binary         → expression operator expression ;
operator       → "==" | "!=" | "<" | "<=" | ">" | ">="
               | "+"  | "-"  | "*" | "/" ;

This grammar will represent the AST we will build after the parser parses the LoxTokInfo tokens that were created previously in the Scanning step. Therefore, we need to represent the above grammar in Haskell as follows:

data BinOp = NotEqual | EqualEqual | Gt | Gte | Lt | Lte | Plus | Minus | Star | Slash
  deriving (Show, Eq)

data UnaryOp = UnaryMinus | UnaryBang deriving (Show, Eq)

data Expr
  = Number Double   -- NUMBER
  | Literal T.Text  -- STRING
  | LoxBool Bool    -- "true", "false"
  | LoxNil          -- 'nil'
  | Paren Expr      -- grouping
  | Unary UnaryOp Expr  -- unary
  | Binary Expr BinOp Expr  -- binary
  deriving (Show, Eq)

The comments provided besides each data constructor ties them back to the productions in the expression grammar that is presented above. This data Expr will be the AST that our parser will produce using the same top-down approach we took in the previous chapter.

Making Parsec Work on LoxTokInfo

Similar to scanning step we want to use the Parsec library to parse the LoxTokInfo tokens we created in the lex step. So far we have used Parsec to parse either String or Char data. Now, we need a way to use Parsec to scan input which are a list of LoxTokInfo tokens. We can do that by implementing our own satisfy version as shown below.

satisfyT :: (LoxTokInfo -> Maybe a) -> Parser a
satisfyT f = tokenPrim showTok updateTokPos match
  where
    showTok ti = show $ tokinfo_type ti
    updateTokPos _ _ (s : _) = tok_position s
    updateTokPos pos _ [] = pos
    match t = f t

The satisfyT is implemented over the tokenPrim parser. The tokenPrim combinator is the most primitive combinator that all other combinators can be built out of. Since, we want all are combinators to work with LoxTokInfo, we can build the satisfyT function over this combinator and then use the satisfyT function is all other other combinators you will build.

More detail description of we this works can be found here. For now, is is sufficient to understand the tokenPrim return the Parser a that satisfies a condition.

Here the a is generic in satisfyT :: (LoxTokInfo -> Maybe a) -> Parser a. In most cases the a will be the Expr type itself. This will become clear as we look at other combinators.

Implementing Parser

Now, that we have the satisfyT function, we are ready to build other parser combinators. Let first define a type alias that will represent the parser each of our combinators will be returning.

type Parser a = ParsecT [LoxTokInfo] () Identity a

In Chapter 6, the shape of recursive-descent parser we want to build is given by the following grammar that is suggested. The grammar suggested takes care of required precedence and associativity relationships.

expression     → equality ;
equality       → comparison ( ( "!=" | "==" ) comparison )* ;
comparison     → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term           → factor ( ( "-" | "+" ) factor )* ;
factor         → unary ( ( "/" | "*" ) unary )* ;
unary          → ( "!" | "-" ) unary
               | primary ;
primary        → NUMBER | STRING | "true" | "false" | "nil"
               | "(" expression ")" ;

This grammar will guide us through the combinators we have to build. First let’s build some combinators for the primary productions. You will notice that these combinator uses the satisfyT function to check the LoxTokInfo they need to match and accordingly return an Expr object.

We have number, literal, loxBool, loxNil and loxParenExpr all lined up with a choice operator in loxPrimary.

As you will notice, each of the Parsers returned by the combinators, take a LoxTokInfo as input, and if that token matches (via satisfyT), then we will return a Expr.

number :: Parser Expr
number = satisfyT f
  where
    f (LoxTokInfo (NUMBER x) _ _ _) = Just (Number x)
    f _ = Nothing

literal :: Parser Expr
literal = satisfyT f
  where
    f (LoxTokInfo (STRING x) _ _ _) = Just (Literal $ T.pack x)
    f _ = Nothing


loxBool :: Parser Expr
loxBool = satisfyT f
  where
    f (LoxTokInfo TRUE _ _ _) = Just (LoxBool True)
    f (LoxTokInfo FALSE _ _ _) = Just (LoxBool False)
    f _ = Nothing


loxNil :: Parser Expr
loxNil = satisfyT f
  where
    f (LoxTokInfo NIL _ _ _) = Just LoxNil
    f _ = Nothing

loxParenExpr :: Parser Expr
loxParenExpr = do
  satisfyT parenOpen *> loxExpr <* satisfyT parenClose
  where
    -- use LoxNil as placeholder, since we do not have an equilivalent Expr for Paren
    parenOpen (LoxTokInfo LEFT_PAREN _ _ _) = Just ()
    parenOpen _ = Nothing

    parenClose (LoxTokInfo RIGHT_PAREN _ _ _) = Just ()
    parenClose _ = Nothing

loxPrimary :: Parser Expr
loxPrimary = number <|> literal <|> loxBool <|> loxNil <|> loxParenExpr

Once we have the primitives, it is just a matter off expanding out each production(line) of the grammar into its own function. But, before that you will notice a common pattern across all expressions. For example,we need to match comparison to one or more of its instances. But, we also want to make sure these are all left-associative.

To help with that, we have a helper function called leftChain, which closely simulates the chainl function in Parsec. We have to implement our own function, since we want a different type for both p and op that will be passed to the leftChain function. The existing definition of chainl does not work for us.

-- this is similar to chainl in `Text.Parsec` but works on `BinOp`
-- adopted from https://jakewheat.github.io/intro_to_parsing/
leftChain :: Parser Expr -> Parser BinOp -> Parser Expr
leftChain p op = do
  expr <- p
  maybeAddSuffix expr
  where
    addSuffix e0 = do
      op' <- op
      e1 <- p
      maybeAddSuffix (Binary e0 op' e1)

    maybeAddSuffix e = addSuffix e <|> return e

Once we have the leftChain function which just need to add a function for each production. For example, for

unary          → ( "!" | "-" ) unary
                | primary ;

we have,

unary' :: Parser Expr
unary' = Unary <$> satisfyT f <*> unary
  where
    f (LoxTokInfo BANG _ _ _) = Just UnaryBang
    f (LoxTokInfo MINUS _ _ _) = Just UnaryMinus
    f _ = Nothing


unary :: Parser Expr
unary = unary' <|> loxPrimary

We follow the same pattern for factor, term and comparison. You will notice that similar to the grammar, the dependency goes from equality -> comparison -> term -> factor -> unary. The only difference in each of these functions are the BinOp we match.

factor :: Parser Expr
factor = leftChain unary (satisfyT f)
  where
    f x = case tokinfo_type x of
      STAR -> Just Star
      SLASH -> Just Slash
      _ -> Nothing

term :: Parser Expr
term = leftChain factor (satisfyT f)
  where
    f x = case tokinfo_type x of
      MINUS -> Just Minus
      PLUS -> Just Plus
      _ -> Nothing


comparison :: Parser Expr
comparison = leftChain term (satisfyT f)
  where
    f x = case tokinfo_type x of
      GREATER -> Just Gt
      GREATER_EQUAL -> Just Gte
      LESS -> Just Lt
      LESS_EQUAL -> Just Lte
      _ -> Nothing

Finally, we can add our helper scanner function:

scannerLoxTokens :: [LoxTokInfo] -> LoxParserResult
scannerLoxTokens = parse loxExpr ""

The concludes the implementation of the Parser.

Examples

--  term
λ >>P.parse equality  "" $ fromRight [] (scanner "1+2;")
Right (Binary (Number 1.0) Plus (Number 2.0))

--  factor
λ >>P.parse equality  "" $ fromRight [] (scanner "1*2/3;")
Right (Binary (Binary (Number 1.0) Star (Number 2.0)) Slash (Number 3.0))

-- comparisons
λ >>P.parse equality  "" $ fromRight [] (scanner "1>2<=3;")
Right (Binary (Binary (Number 1.0) Gt (Number 2.0)) Lte (Number 3.0))

-- equality
λ >>P.parse equality  "" $ fromRight [] (scanner "1>2==3;")
Right (Binary (Binary (Number 1.0) Gt (Number 2.0)) EqualEqual (Number 3.0))

Conclusion

That concludes the implementation of the expression parser. With this we have laid of the groundwork to extend the suite of combinators we have and support more language features in later chapters.

I would like to thank bss03 for providing good ideas here to refine the implemenation of satistyT.