# Parsec Presentation

tl;dr: Short introduction to Parsec for beginner.

• The html presentation is here.

# Parsec

by Yann Esposito

## Parsing

Latin pars (ōrātiōnis), meaning part (of speech).

• analysing a string of symbols
• formal grammar.

## Parsing in Programming Languages

Complexity:

Method Typical Example Output Data Structure
Splitting CSV Array, Map
Regexp email
• Fixed Layout Tree
Parser Programming language
• Most Data Structure

## Parser & culture

In Haskell Parser are really easy to use.

Generally:

• In most languages: split then regexp then parse
• In Haskell: split then parse

## Parsing Example

From String:

(1+3)*(1+5+9)

To data structure:

## Parsec

Parsec lets you construct parsers by combining high-order Combinators to create larger expressions.

Combinator parsers are written and used within the same programming language as the rest of the program.

The parsers are first-class citizens of the languages […]“

## Parser Libraries

In reality there are many choices:

 attoparsec fast Bytestring-lexing fast Parsec 3 powerful, nice error reporting

spaces are meaningful

f x   -- ⇔ f(x) in C-like languages
f x y -- ⇔ f(x,y)

Don’t mind strange operators (<*>, <$>). Consider them like separators, typically commas. They are just here to deal with types. Informally: toto <$> x <*> y <*> z
-- ⇔ toto x y z
-- ⇔ toto(x,y,z) in C-like languages

## Minimal Parsec Examples

whitespaces = many (oneOf "\t ")
number = many1 digit
symbol = oneOf "!#$%&|*+-/:<=>?@^_~" " “ – whitespaces on” “ “” – whitespaces on “32” “32” – number on “32” – number on " 32 “ “number” (line 1, column 1): unexpected ” “ expecting digit  ## Comparison with Regexp (Parsec) data IP = IP Word8 Word8 Word8 Word8 ip = IP <$>                             -- number.number.number.number
number <*  char '.' <*>          -- put that in an IP data type
number <*  char '.' <*>
number <*  char '.' <*>
number
number = do
x <- read <$> many1 digit -- read the number guard (0 <= x && x < 256) -- ensure it is 0 <= x < 256 return (fromIntegral x) -- returns a Word8 ## Comparison with Regexp (Perl Regexp) # remark: 888.999.999.999 is accepted \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b # exact but difficult to read \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b Also, regexp are unityped by nature. ## Monadic style number :: Parser String number = many1 digit number' :: Parser Int number' = do -- digitString :: String digitString <- many1 digit return (read digitString) "32" :: [Char] -- number on "32" 32 :: Int -- number' on "32" ## Combining Monadic style (S = aSb | ε) s = do a <- string "a" mid <- s b <- string "b" return (a ++ mid ++ b) <|> string "" "" -- s on "" "aaabbb" -- s on "aaabbb" "aabb" -- s on "aabbb" -- s on "aaabb" S (line1 1, column 4): unexpected end of input expecting "b" ## Combining Applicative style (S = aSb | ε) s = concat3 <$> string "a" <*> s <*> char "b"
<|> string ""
where
concat3 x y z = x ++ y ++ z

## Applicative Style usefull with Data types

data IP = IP Word8 Word8 Word8 Word8

parseIP = IP <$> number <* char '.' <*> number <* char '.' <*> number <* char '.' <*> number monadicParseIP = do d1 <- number char '.' d2 <- number char '.' d3 <- number char '.' d4 <- number return (IP d1 d2 d3 d4) ## Write number correctly number :: Parser Word8 number = do x <- read <$> many1 digit
guard (0 <= x && x < 256) <?>
"Number between 0 and 255 (here " ++ show x ++ ")"
return (fromIntegral x)
>>> test parseIP "parseIP" "823.32.80.113"
"parseIP" (line 1, column 4):
unexpected "."
expecting digit or Number between 0 and 255 (here 823)

## So

• combination of simple parsers
• error messages with (<?>)
• embed result in data type using Applicative style
• Not shown, use another monad with the parser

Time to do something cool

## Useful definition

try tries to parse and backtracks if it fails.

(<||>) parser1 parser2 = try parser1 <|> parser2

## Scheme

Remember from text to data structure. Our data structure:

data LispVal =    Atom String     -- print or val-3 ...
| Number Integer  -- 32
| String String   -- "foo"
| Bool Bool       -- #t or #f
| List [LispVal]  -- (print "foo" "bar")

Next will parse String, Atom, Integer

## Parse String

parseString :: Parser LispVal
parseString = do
char '"'
x <- many (noneOf "\"")
char '"'
return (String x)
-- parseString on '"toto"'
String "toto" :: LispVal
-- parseString on '" hello"'
String " hello" :: LispVal

## Parse Atom

In Scheme true is #t and false #f. Which are also valid Atoms.

symbol :: Parser Char
symbol = oneOf "!#$%&|*+-/:<=>?@^_~" parseAtom :: Parser LispVal parseAtom = do first <- letter <|> symbol rest <- many (letter <|> digit <|> symbol) let atom = first:rest return$ case atom of
"#t" -> Bool True
"#f" -> Bool False
_    -> Atom atom

## Test parseAtom

-- parseAtom on '#t'
Bool True :: LispVal
-- parseAtom on '#f'
Bool False :: LispVal
-- parseAtom on 'some-atom'
Atom "some-atom" :: LispVal

## Parse Number

parseNumber :: Parser LispVal
parseNumber = Number . read <$> many1 digit -- parseNumber on '18' Number 18 :: LispVal -- parseNumber on '188930992344321234' Number 188930992344321234 :: LispVal ## Compose all parsers parseExpr :: Parser LispVal parseExpr = parseAtom <||> parseString <||> parseNumber ## Test the parser -- parseExpr on '188930992344321234' Number 188930992344321234 :: LispVal -- parseExpr on '#t' Bool True :: LispVal -- parseExpr on 'just-some-word' Atom "just-some-word" :: LispVal -- parseExpr on '%-symbol-start' Atom "%-symbol-start" :: LispVal -- parseExpr on '"a String"' String "a String" :: LispVal ## Recursive Parsers parseList :: Parser LispVal parseList = List <$>
(char '(' *> sepBy parseExpr' spaces <* char ')' )

parseExpr' :: Parser LispVal
parseExpr' = parseAtom
<||> parseString
<||> parseNumber
<||> parseList

## Test Parse List

-- parseExpr' on '(foo (bar baz))'
List [Atom "foo",List [Atom "bar",Atom "baz"]] :: LispVal

-- parseExpr' on '(foo (bar)'
"parseExpr'" (line 1, column 11):
unexpected end of input
expecting white space, letter, "\"", digit, "(" or ")"

-- parseExpr' on '(((foo)) bar)'
List [List [List [Atom "foo"]],Atom "bar"] :: LispVal

## Remark

Close relation between parser and data types.

data Product a b = Product a b
data Sum a b = A a | B b
productParser = do
parser1
parser2
sumParser = parser1 <||> parser2

## Remark (2)

Similar structure:

data MyType = X Int Char | Y String

parseMyType =      (X <$> parseInt <*> parseChar) <||> (Y <$> parseString)

## Remark (3)

Similar structure:

data Answer = YES | NO | MAYBE
data MyType = X Int Answer | Y String

parseAnswer = parseYES <||> parseNO <||> parseMAYBE
parseMyType =      (X <$> parseInt <*> parseAnswer) <||> (Y <$> parseString)

## Conclusion

So Parser are more powerful than regular expression.
Parsec make it very easy to use.
Easy to read and to manipulate.

Notice how you could use parser as any other object in Haskell.
You could mapM them for example.

Any question?

## Appendice (Do it yourself)

{-# LANGUAGE DeriveDataTypeable #-}
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Data.Typeable
type Parser a = Parsec String () a

myparser = many1 letter

main :: IO ()
main = test myparser "myparser" "(my (string to) parse)"

test :: (Typeable a, Show a) => Parser a -> String -> String -> IO ()
test parser description string = do
putStrLn $"-- " ++ description ++ " on \"" ++ string ++ "\"" let res = parse parser description string case res of Left err -> print err Right value -> putStrLn$ show value ++ " :: " ++ show (typeOf value)

## Appendice (2)

ADA:  DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp