I often forget about the regular expression modifiers <code>m</code> and <code>s</code> and their differences. What is a good way to remember them? As I understand them, they are: <blockquote> 'm' is for multiline, so that <code>^</code> and <code>$</code> will match beginning of string and end of string multiple times. (as divided by <code>\n</code>) 's' is so that the dot will match even the newline character </blockquote> Often, I just use <pre class="prettyprint"><code>/some_pattern/ism </code></pre> But it probably is better to use them accordingly (usually "s" in my cases). What do you think can be a good way to remember them, instead of forgetting which is which every time?

I like the explanation in 'man perlre': m Treat string as multiple lines. s Treat string as single line. With multiple lines, ^ and $ apply to individual lines (i.e. just before and after newlines). With a single line, ^ and $ apply to the whole, and \n just becomes another character you can match. [Wrong]By using both m and s as you described, I would expect the second one to take precedence, so you would always be in multiline mode with /ism.[/Wrong] I didn't read far enough: The "/s" and "/m" modifiers both override the $* setting. That is, no matter what $* contains, "/s" without "/m" will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

Difference between regular expression modifiers (or flags) 'm' and 's'?

Tags:

I often forget about the regular expression modifiers m and s and their differences. What is a good way to remember them?

As I understand them, they are:

'm' is for multiline, so that ^ and $ will match beginning of string and end of string multiple times. (as divided by \n)

's' is so that the dot will match even the newline character

Often, I just use

/some_pattern/ism

But it probably is better to use them accordingly (usually "s" in my cases).

What do you think can be a good way to remember them, instead of forgetting which is which every time?

285

asked May 28 '09 01:05

nonopolarity

2 Answers

It's not uncommon to find someone who's been using regexes for years who still doesn't understand how those two modifiers work. As you observed, the names "multiline" and "singleline" are not very helpful. They sound like they must be mutually exclusive, but they're completely independent. I suggest you ignore the names and concentrate on what they do: m changes the behavior of the anchors (^ and $), and s changes the behavior of the dot (.).

One prominent person who mixed up the modes is the author of Ruby. He created his own regex implementation based on Perl's, except he decided to have ^ and $ always be line anchors--that is, multiline mode is always on. Unfortunately, he also incorrectly named the dot-matches-everything mode multiline. So Ruby has no s modifier, but its m modifier does what s does in other flavors.

As for always using /ism, I recommend against it. It's mostly harmless, as you've discovered, but it sends a confusing message to anyone else who's trying to figure out what the regex was supposed to do (or even to yourself, in the future).

190

answered Sep 22 '22 04:09

Alan Moore

I like the explanation in 'man perlre':

m Treat string as multiple lines.
s Treat string as single line.

With multiple lines, ^ and $ apply to individual lines (i.e. just before and after newlines).
With a single line, ^ and $ apply to the whole, and \n just becomes another character you can match.

[Wrong]By using both m and s as you described, I would expect the second one to take precedence, so you would always be in multiline mode with /ism.[/Wrong]

I didn't read far enough:
The "/s" and "/m" modifiers both override the $* setting. That is, no matter what $* contains, "/s" without "/m" will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

answered Sep 24 '22 04:09

JimG

Related questions
                            
                                Where can I get dsquery and other tools useful for debugging Active Directory issues?
                            
                                How to get audio data from a MP3?
                            
                                Problems passing class objects through GWT RPC
                            
                                Pausing a method for set # of milliseconds
                            
                                How does an OS affect how assembly code runs?
                            
                                What's the use of a method-local inner class?
                            
                                Clearing the test database between unit and functional tests in Rails (factory_girl)
                            
                                Efficient way to fingerprint an image (jpg, png, etc)?
                            
                                How accurate is Thread.Sleep(TimeSpan)?
                            
                                Finding current directory during Visual Studio debugging session?
                            
                                Get str repr with double quotes Python
                            
                                Scala and HTML parsing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With