I'm trying to test a string for a basic html pattern and although I use the m (multiline) modifier it only works when the string is a 1-liner
(re-find #"(?im)^<html>.*<body>.*</body>.*</html>" c))
Fails:
"<html> <body> sad </body>
</html>"
Works:
"<html> <body> sad </body> </html>"
What am I doing wrong?
Disclaimer: I'm not a Clojure programmer, but I think this problem is independent of the language.
When multi-line mode is enabled, the interpretation of the caret ^
and the dollar $
changes like this: Instead of matching the beginning and end of the entire input string, they match the beginning and the end of each line in the input string. This is - as far as I can see - not what you want/need.
What you want is for your .*
s to match newlines (what they don't do by default) and this can be done by enabling the single-line mode (aka dot-all mode). So this means:
(re-find #"(?is)^<html>.*<body>.*</body>.*</html>" c))
You can also verify this on RegExr.
You need to use the (?s)
"dotall mode" switch.
Example:
user=> (re-find #"\d{3}.\d{3}" "123\n456")
nil
user=> (re-find #"(?s)\d{3}.\d{3}" "123\n456")
"123\n456"
The (?m)
switch is deceptively named -- it changes what the ^
and $
anchors do, allowing them to also match start-of-line and end-of-line, respectively -- which is not want you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With