Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way of parsing strings? [closed]

We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.

We'll need to support more 'formats' in the future too.

Do we go for a solution using:

  • Regex
  • Simply string searching (using string.IndexOf etc)
  • Lex/ Yacc
  • Other

The overall solution will be developed in C# 2.0 (hopefully 3.5)

like image 622
Kieron Avatar asked Sep 11 '08 11:09

Kieron


2 Answers

Regex.

Regex can solve almost everything except for world peace. Well maybe world peace too.

like image 89
Iain Holder Avatar answered Oct 11 '22 15:10

Iain Holder


The three solutions you stated each cover very different needs.

Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.

Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.

Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.

I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.

As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.

like image 38
Coincoin Avatar answered Oct 11 '22 15:10

Coincoin