While it's absolutely true that regexp are not the right tool to fully parse HTML documents, I am seeing a lot of people blindly disregarding any question about regexp if they as much as see a single HTML tag in the proposed text.
Since we see a lot of examples of regexp not being the right tool, I ask your opinion on this: what are the cases where a simple pattern match is a better solution than using a full parsing engine?
Regular expressions are useful in search and replace operations. The typical use case is to look for a sub-string that matches a pattern and replace it with something else. Most APIs using regular expressions allow you to reference capture groups from the search pattern in the replacement string.
A regular expression (also called regex or regexp) is a way to describe a pattern. It is used to locate or validate specific strings or patterns of text in a sentence, document, or any other character input. Regular expressions use both basic and special characters.
String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way. Regular expressions have to be parsed, and code generated to perform the operation using string operations.
If the set of HTML you're looking to parse with a regexp is known to conform to some sort of pattern. e.g. if you know there's no commented-out HTML, or complex scenarios etc.
e.g. I often preach that you shouldn't use regexps for HTML, but if I have a set of HTML that I'm familiar with, is straightforward and that I can check easily post-manipulation, then I have no qualms about using a regexp for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With