If you had to implement a lightweight XML parser, would you choose to use regex?
The XML parsing in my case would be most simplified: only tags and text content. No namespaces, no attributes, no schema support (at the beginning surely, but maybe...).
I think it would be a good exercise for me to learn the new C++0x <regex> library. However, I was wondering if XML parsing wouldn't be above decent regex limits.
In a word: no. XML is not a regular language.
UPDATE (To expand, based on the discussion in the comments below)
XML is not regular, so you cannot hope to use regexes to perform some sort of one-hit parse/split operation on the entire file/string.
Whilst you could write a state-machine-based parser that uses regexes to perform the lexing/tokenisation, IMHO this would be less efficient, and more error-prone, than using a tool that's meant for the job. As others have said, Flex/Bison is one option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With