My program can accept data that has newline characters of \n, \r\n or \r (eg Unix, PC or Mac styles)
What is the best way to construct a regular expression that will match whatever the encoding is?
Alternatively, I could use universal_newline support on input, but now I'm interested to see what the regex would be.
By default in most regex engines, . doesn't match newline characters, so the matching stops at the end of each logical line. If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
So use \s\S, which will match ALL characters.
According to regex101.com \s : Matches any space, tab or newline character.
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
The regex I use when I want to be precise is "\r\n?|\n"
.
When I'm not concerned about consistency or empty lines, I use "[\r\n]+"
, I imagine it makes my programs somewhere in the order of 0.2% faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With