Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use FParsec to parse a self-describing input

Tags:

f#

fparsec

I'm using FParsec to parse an input that describes its own format. For example, consider this input:

int,str,int:4,'hello',3

The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.

What is the best way to parse something like this with FParsec?

I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.

open FParsec

type State = { Formats : string[]; Index : int32 }
    with static member Default = { Formats = [||]; Index = 0 }

type Value =
    | Integer of int
    | String of string

let parseFormat : Parser<_, State> =
    parse {
        let! formats =
            sepBy
                (pstring "int" <|> pstring "str")
                (skipString ",")
                |>> Array.ofList
        do! updateUserState (fun state -> { state with Formats = formats })
    }

let parseValue format =
    match format with
        | "int" -> pint32 |>> Integer
        | "str" ->
            between
                (skipString "'")
                (skipString "'")
                (manySatisfy (fun c -> c <> '\''))
                    |>> String
        | _ -> failwith "Unexpected"

let parseValueByState =
    parse {
        let! state = getUserState
        let format = state.Formats.[state.Index]
        do! setUserState { state with Index = state.Index + 1}
        return! parseValue format
    }

let parseData =
    sepBy
        parseValueByState
        (skipString ",")

let parse =
    parseFormat
        >>. skipString ":"
        >>. parseData

[<EntryPoint>]
let main argv =
    let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
    printfn "%A" result
    0
like image 811
Brian Berns Avatar asked Apr 19 '15 07:04

Brian Berns


1 Answers

There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.

First, several library functions that may appear useful in other FParsec-related projects:

/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)

/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>

/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
    p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)

/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
    let rec fold state =
        function
        | [] -> pzero
        | hd::[] -> state <<+ hd 
        | hd::tl -> fold (state <<+ (hd .>> separator)) tl
    fold (preturn []) parsers
    |>> List.rev    // reverse the list since we appended to the top

Now, the main code. The basic idea is to run parsing in three steps:

  1. Parse out the keys (which are plain ASCII strings)
  2. Map these keys to actual Value parsers
  3. Run these parsers in order

The rest seems to be commented within the code. :)

/// The resulting type
type Output =
    | Integer of int
    | String of string

/// tag to parser mappings
let mappings =
    [
        "int" => (pint32 |>> Integer)
        "str" => (
                    manySatisfy (fun c -> c <> '\'')
                    |> between (skipChar ''') (skipChar ''')
                    |>> String
                 )
    ]
    |> makeMap

let myProcess : Parser<Output list> =
    let pKeys =                     // First, we parse out the keys
        many1Satisfy isAsciiLower   // Parse one key; keys are always ASCII strings
        |> sepBy <| (skipChar ',')  // many keys separated by comma
        .>> (skipChar ':')          // all this with trailing semicolon
    let pValues = fun keys ->
        keys                        // take the keys list
        |> List.map                 // find the required Value parser
                                    // (NO ERROR CHECK for bad keys)
            (fun p -> Map.find p mappings)
        |> allOfSepBy (skipChar ',') // they must run in order, comma-separated
    pKeys >>= pValues

Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]

like image 167
bytebuster Avatar answered Sep 27 '22 21:09

bytebuster