Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advice on FParsec for handling whitespace

Tags:

f#

fparsec

I have the following subexpression to parse 'quotes' which have the following format

"5.75 @ 5.95"

I therefore have this parsec expression to parse it

let pquote x = (sepBy (pfloat) ((spaces .>> (pchar '/' <|>  pchar '@' )>>. spaces))) x

It works fine.. except when there is a trailing space in my input, as the separator expression starts to consume content.So I wrapped it around an attempt, which works and seems, from what I understand, more or less what this was meant to be.

let pquote x = (sepBy (pfloat) (attempt (spaces .>> (pchar '/' <|>  pchar '@' )>>. spaces))) x

As I dont know fparsec so well, I wonder if there are any better way to write this. it seems a bit heavy (while still being very manageable of course)

like image 882
nicolas Avatar asked May 04 '12 15:05

nicolas


People also ask

How do I implement parsers with fparsec?

Implementing parsers with FParsec typically means combining higher‐level parsers from lower‐level ones. You start with the parser primitives provided by the library and then successively combine these into higher‐level parsers until you finally have a single parser for the complete input.

What is fparsec and why should I Care?

FParsec is an incredibly robust framework for building parsers with the combinatorial approach. With some careful fine-tuning, parsers written with this library may even outperform traditional hand-rolled parsers. If you’re using .NET and want to build a text processor, compiler, or a DSL interpreter-FParsec is likely a no-brainer.

How to backtrack when fparsec fails?

To instruct FParsec to backtrack in such cases, we can wrap the individual parsers in attempt which will reset the state after the underlying parser fails:

How do you deal with insignificant whitespace?

The traditional way of dealing with insignificant whitespace involves writing a separate lexer component, which parses raw characters into so-called tokens. It can be done with FParsec as well and it provides many benefits, but for the sake of simplicity we’ll be writing a scanner-less parser this time.


1 Answers

let s1 = "5.75         @             5.95              "
let s2 = "5.75/5.95   "
let pquote: Parser<_> =
    pfloat
    .>> spaces .>> skipAnyOf ['@'; '/'] .>> spaces
    .>>. pfloat
    .>> spaces

Notes:

  1. I've made spaces optional everywhere spaces skips any sequence of zero or more whitespaces, so there's no need to use opt - thanks @Daniel;
  2. type Parser<'t> = Parser<'t, UserState> - I define it this way in order to avoid "value restriction" error; you may remove it;
  3. Also, don't forget the following if your program may run on a system with default language settings having decimal comma: System.Threading.Thread.CurrentThread.CurrentCulture <- Globalization.CultureInfo.GetCultureInfo "en-US" this won't work, thanks @Stephan
  4. I would not use sepBy unless I have a value list of unknown size.
  5. If you don't really need the value returned (e.g. '@' characters), it is recommended to use skip* functions instead p* for performance considerations.

UPD added slash as separator

like image 198
bytebuster Avatar answered Sep 28 '22 16:09

bytebuster