Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Would you implement a lightweight XML parser with <regex>?

Tags:

c++

regex

xml

c++11

If you had to implement a lightweight XML parser, would you choose to use regex?

The XML parsing in my case would be most simplified: only tags and text content. No namespaces, no attributes, no schema support (at the beginning surely, but maybe...).

I think it would be a good exercise for me to learn the new C++0x <regex> library. However, I was wondering if XML parsing wouldn't be above decent regex limits.

like image 518
Stephane Rolland Avatar asked Dec 04 '22 10:12

Stephane Rolland


1 Answers

In a word: no. XML is not a regular language.

UPDATE (To expand, based on the discussion in the comments below)

XML is not regular, so you cannot hope to use regexes to perform some sort of one-hit parse/split operation on the entire file/string.

Whilst you could write a state-machine-based parser that uses regexes to perform the lexing/tokenisation, IMHO this would be less efficient, and more error-prone, than using a tool that's meant for the job. As others have said, Flex/Bison is one option.

like image 63
Oliver Charlesworth Avatar answered Jan 02 '23 03:01

Oliver Charlesworth