# Parsec Presentation

tl;dr: Short introduction to Parsec for beginner.

• The html presentation is here.

# Parsec

by Yann Esposito

## Parsing

Latin pars (ōrātiōnis), meaning part (of speech).

• analysing a string of symbols
• formal grammar.

## Parsing in Programming Languages

Complexity:

Method Typical Example Output Data Structure
Splitting CSV Array, Map
Regexp email
• Fixed Layout Tree
Parser Programming language
• Most Data Structure

## Parser & culture

In Haskell Parser are really easy to use.

Generally:

• In most languages: split then regexp then parse
• In Haskell: split then parse

## Parsing Example

From String:

``(1+3)*(1+5+9)``

To data structure:

## Parsec

Parsec lets you construct parsers by combining high-order Combinators to create larger expressions.

Combinator parsers are written and used within the same programming language as the rest of the program.

The parsers are first-class citizens of the languages […]“

## Parser Libraries

In reality there are many choices:

 attoparsec fast Bytestring-lexing fast Parsec 3 powerful, nice error reporting

spaces are meaningful

``````f x   -- ⇔ f(x) in C-like languages
f x y -- ⇔ f(x,y)``````

Don’t mind strange operators (`<*>`, `<\$>`).
Consider them like separators, typically commas.
They are just here to deal with types.

Informally:

``````toto <\$> x <*> y <*> z
-- ⇔ toto x y z
-- ⇔ toto(x,y,z) in C-like languages``````

## Minimal Parsec Examples

``````whitespaces = many (oneOf "\t ")
number = many1 digit
symbol = oneOf "!#\$%&|*+-/:<=>[email protected]^_~"``````
``````" “ – whitespaces on” “ “” – whitespaces on “32” “32” – number on “32”

– number on " 32 “ “number” (line 1, column 1): unexpected ” “ expecting digit```
```

## Comparison with Regexp (Parsec)

``````data IP = IP Word8 Word8 Word8 Word8
ip = IP <\$>                             -- number.number.number.number
number <*  char '.' <*>          -- put that in an IP data type
number <*  char '.' <*>
number <*  char '.' <*>
number
number = do
guard (0 <= x && x < 256) -- ensure it is 0 <= x < 256
return (fromIntegral x)   -- returns a Word8``````

## Comparison with Regexp (Perl Regexp)

``````# remark: 888.999.999.999 is accepted
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# exact but difficult to read
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b``````

Also, regexp are unityped by nature.

``````number :: Parser String
number = many1 digit

number' :: Parser Int
number' = do
-- digitString :: String
digitString <- many1 digit
``````"32" :: [Char]  -- number on "32"
32   :: Int     -- number' on "32"``````

## Combining Monadic style (S = aSb | ε)

``````s = do
a <- string "a"
mid <- s
b <- string "b"
return (a ++ mid ++ b)
<|> string ""``````
``````""          -- s on ""
"aaabbb"    -- s on "aaabbb"
"aabb"      -- s on "aabbb"
-- s on "aaabb"
S (line1 1, column 4):
unexpected end of input
expecting "b"``````

## Combining Applicative style (S = aSb | ε)

``````s = concat3 <\$> string "a" <*> s <*> char "b"
<|> string ""
where
concat3 x y z = x ++ y ++ z``````

## Applicative Style usefull with Data types

``````data IP = IP Word8 Word8 Word8 Word8

parseIP = IP <\$>
number <*  char '.' <*>
number <*  char '.' <*>
number <*  char '.' <*>
number

d1 <- number
char '.'
d2 <- number
char '.'
d3 <- number
char '.'
d4 <- number
return (IP d1 d2 d3 d4)``````

## Write number correctly

``````number :: Parser Word8
number = do
x <- read <\$> many1 digit
guard (0 <= x && x < 256) <?>
"Number between 0 and 255 (here " ++ show x ++ ")"
return (fromIntegral x)``````
``````>>> test parseIP "parseIP" "823.32.80.113"
"parseIP" (line 1, column 4):
unexpected "."
expecting digit or Number between 0 and 255 (here 823)``````

## So

• combination of simple parsers
• error messages with `(<?>)`
• embed result in data type using Applicative style
• Not shown, use another monad with the parser

Time to do something cool

## Useful definition

`try` tries to parse and backtracks if it fails.

``(<||>) parser1 parser2 = try parser1 <|> parser2``

## Scheme

Remember from text to data structure. Our data structure:

``````data LispVal =    Atom String     -- print or val-3 ...
| Number Integer  -- 32
| String String   -- "foo"
| Bool Bool       -- #t or #f
| List [LispVal]  -- (print "foo" "bar")``````

Next will parse String, Atom, Integer

## Parse String

``````parseString :: Parser LispVal
parseString = do
char '"'
x <- many (noneOf "\"")
char '"'
return (String x)``````
``````-- parseString on '"toto"'
String "toto" :: LispVal
-- parseString on '" hello"'
String " hello" :: LispVal``````

## Parse Atom

In Scheme true is `#t` and false `#f`. Which are also valid Atoms.

``````symbol :: Parser Char
symbol = oneOf "!#\$%&|*+-/:<=>[email protected]^_~"

parseAtom :: Parser LispVal
parseAtom = do
first <- letter <|> symbol
rest <- many (letter <|> digit <|> symbol)
let atom = first:rest
return \$ case atom of
"#t" -> Bool True
"#f" -> Bool False
_    -> Atom atom``````

## Test `parseAtom`

``````-- parseAtom on '#t'
Bool True :: LispVal
-- parseAtom on '#f'
Bool False :: LispVal
-- parseAtom on 'some-atom'
Atom "some-atom" :: LispVal``````

## Parse Number

``````parseNumber :: Parser LispVal
parseNumber = Number . read <\$> many1 digit``````
``````-- parseNumber on '18'
Number 18 :: LispVal
-- parseNumber on '188930992344321234'
Number 188930992344321234 :: LispVal``````

## Compose all parsers

``````parseExpr :: Parser LispVal
parseExpr = parseAtom
<||> parseString
<||> parseNumber``````

## Test the parser

``````-- parseExpr on '188930992344321234'
Number 188930992344321234 :: LispVal
-- parseExpr on '#t'
Bool True :: LispVal
-- parseExpr on 'just-some-word'
Atom "just-some-word" :: LispVal
-- parseExpr on '%-symbol-start'
Atom "%-symbol-start" :: LispVal
-- parseExpr on '"a String"'
String "a String" :: LispVal``````

## Recursive Parsers

``````parseList :: Parser LispVal
parseList = List <\$>
(char '(' *> sepBy parseExpr' spaces <* char ')' )

parseExpr' :: Parser LispVal
parseExpr' = parseAtom
<||> parseString
<||> parseNumber
<||> parseList``````

## Test Parse List

``````-- parseExpr' on '(foo (bar baz))'
List [Atom "foo",List [Atom "bar",Atom "baz"]] :: LispVal

-- parseExpr' on '(foo (bar)'
"parseExpr'" (line 1, column 11):
unexpected end of input
expecting white space, letter, "\"", digit, "(" or ")"

-- parseExpr' on '(((foo)) bar)'
List [List [List [Atom "foo"]],Atom "bar"] :: LispVal``````

## Remark

Close relation between parser and data types.

``````data Product a b = Product a b
data Sum a b = A a | B b``````
``````productParser = do
parser1
parser2
sumParser = parser1 <||> parser2``````

## Remark (2)

Similar structure:

``````data MyType = X Int Char | Y String

parseMyType =      (X <\$> parseInt <*> parseChar)
<||> (Y <\$> parseString)``````

## Remark (3)

Similar structure:

``````data Answer = YES | NO | MAYBE
data MyType = X Int Answer | Y String

parseAnswer = parseYES <||> parseNO <||> parseMAYBE
parseMyType =      (X <\$> parseInt <*> parseAnswer)
<||> (Y <\$> parseString)``````

## Conclusion

So Parser are more powerful than regular expression.
Parsec make it very easy to use.
Easy to read and to manipulate.

Notice how you could use parser as any other object in Haskell.
You could `mapM` them for example.

Any question?

## Appendice (Do it yourself)

``````{-# LANGUAGE DeriveDataTypeable #-}
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Data.Typeable
type Parser a = Parsec String () a

myparser = many1 letter

main :: IO ()
main = test myparser "myparser" "(my (string to) parse)"

test :: (Typeable a, Show a) => Parser a -> String -> String -> IO ()
test parser description string = do
putStrLn \$ "-- " ++ description ++ " on \"" ++ string ++ "\""
let res = parse parser description string
case res of
Left  err   -> print err
Right value -> putStrLn \$ show value ++ " :: " ++ show (typeOf value)``````

## Appendice (2)

ADA: ``` DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp ```