This feels like a really simple question, but I can't find the answer anywhere.
(Notes: I'm using Python, but this shouldn't matter.)
Say I have the following string:
s = "foo\nbar\nfood\nfoo"
I am simply trying to find a regex that will match both instances of "foo", but not "food", based on the fact that the "foo" in "food" is not immediately followed by either a newline or the end of the string.
This is perhaps an overly complicated way to express my question, but it gives something concrete to work with.
Here are some of the things I have tried, with results (Note: the result I want is [foo\n
, foo
]):
foo[\n\Z]
=> ['foo\n'
]
foo(\n\Z)
=> ['\n'
, ''
] <= This seems to match the newline and EOS, but not the foo
foo($|\n)
=> ['\n'
, ''
]
(foo)($|\n)
=> [(foo
,'\n'
), (foo
,''
)] <= Almost there, and this is a useable plan B, but I would like to find the perfect solution.
The only thing I found that does work is:
foo$|foo\n
=> ['foo\n'
, `'foo']
This is fine for such a simple example, but it is easy to see how it could become unwieldy with a much larger expression (and yes, this foo
thing is a stand in for the larger expression I am actually using).
Interesting aside: The closest SO question I could find to my problem was this one: In regex, match either the end of the string or a specific character
Here, I could simply substitute \n
for my 'specific character'. Now, the accepted answer uses the regex /(&|\?)list=.*?(&|$)/
. I notice that the OP was using JavaScript (question was tagged with the javascript
tag), so maybe the JavaScript regex interpreter is different, but when I use the exact strings given in the question with the above regex in Python, I get bad results:
>>> findall("(&|\?)list=.*?(&|$)", "index.php?test=1&list=UL")
[('&', '')]
>>> findall("(&|\?)list=.*?(&|$)", "index.php?list=UL&more=1")
[('?', '&')]
So, I'm stumped.
"\n" matches a newline character.
The correct regex to use is ^\d+$. Because “start of string” must be matched before the match of \d+, and “end of string” must be matched right after it, the entire string must consist of digits for ^\d+$ to be able to match.
The dot matches a single character, without caring what that character is. The only exception are line break characters. In all regex flavors discussed in this tutorial, the dot does not match line breaks by default.
You could use re.MULTILINE
and include an optional linebreak after the $
in your pattern:
s = "foo\nbar\nfood\nfoo"
pattern = re.compile('foo$\n?', re.MULTILINE)
print re.findall(pattern, s)
# -> ['foo\n', 'foo']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With