Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't find the correct regex syntax to match newline or end of string

This feels like a really simple question, but I can't find the answer anywhere.

(Notes: I'm using Python, but this shouldn't matter.)

Say I have the following string:

s = "foo\nbar\nfood\nfoo"

I am simply trying to find a regex that will match both instances of "foo", but not "food", based on the fact that the "foo" in "food" is not immediately followed by either a newline or the end of the string.

This is perhaps an overly complicated way to express my question, but it gives something concrete to work with.

Here are some of the things I have tried, with results (Note: the result I want is [foo\n, foo]):

foo[\n\Z] => ['foo\n']

foo(\n\Z) => ['\n', ''] <= This seems to match the newline and EOS, but not the foo

foo($|\n) => ['\n', '']

(foo)($|\n) => [(foo,'\n'), (foo,'')] <= Almost there, and this is a useable plan B, but I would like to find the perfect solution.

The only thing I found that does work is:

foo$|foo\n => ['foo\n', `'foo']

This is fine for such a simple example, but it is easy to see how it could become unwieldy with a much larger expression (and yes, this foo thing is a stand in for the larger expression I am actually using).


Interesting aside: The closest SO question I could find to my problem was this one: In regex, match either the end of the string or a specific character

Here, I could simply substitute \n for my 'specific character'. Now, the accepted answer uses the regex /(&|\?)list=.*?(&|$)/. I notice that the OP was using JavaScript (question was tagged with the javascript tag), so maybe the JavaScript regex interpreter is different, but when I use the exact strings given in the question with the above regex in Python, I get bad results:

>>> findall("(&|\?)list=.*?(&|$)", "index.php?test=1&list=UL")
[('&', '')]
>>> findall("(&|\?)list=.*?(&|$)", "index.php?list=UL&more=1")
[('?', '&')]

So, I'm stumped.

like image 732
Ken Bellows Avatar asked Dec 31 '12 16:12

Ken Bellows


People also ask

How do you match a new line character in regex?

"\n" matches a newline character.

What is the regex pattern for end of string?

The correct regex to use is ^\d+$. Because “start of string” must be matched before the match of \d+, and “end of string” must be matched right after it, the entire string must consist of digits for ^\d+$ to be able to match.

Does regex dot match newline?

The dot matches a single character, without caring what that character is. The only exception are line break characters. In all regex flavors discussed in this tutorial, the dot does not match line breaks by default.


1 Answers

You could use re.MULTILINE and include an optional linebreak after the $ in your pattern:

s = "foo\nbar\nfood\nfoo"
pattern = re.compile('foo$\n?', re.MULTILINE)
print re.findall(pattern, s)
# -> ['foo\n', 'foo']
like image 166
omz Avatar answered Oct 20 '22 18:10

omz