This feels like a really simple question, but I can't find the answer anywhere. (Notes: I'm using Python, but this shouldn't matter.) Say I have the following string: <pre class="prettyprint"><code>s = "foo\nbar\nfood\nfoo" </code></pre> I am simply trying to find a regex that will match both instances of "foo", but not "food", based on the fact that the "foo" in "food" is not immediately followed by either a newline or the end of the string. This is perhaps an overly complicated way to express my question, but it gives something concrete to work with. Here are some of the things I have tried, with results (Note: the result I want is [<code>foo\n</code>, <code>foo</code>]): <code>foo[\n\Z]</code> => [<code>'foo\n'</code>] <code>foo(\n\Z)</code> => [<code>'\n'</code>, <code>''</code>] <= This seems to match the newline and EOS, but not the <code>foo</code> <code>foo($|\n)</code> => [<code>'\n'</code>, <code>''</code>] <code>(foo)($|\n)</code> => [(<code>foo</code>,<code>'\n'</code>), (<code>foo</code>,<code>''</code>)] <= Almost there, and this is a useable plan B, but I would like to find the perfect solution. The only thing I found that does work is: <code>foo$|foo\n</code> => [<code>'foo\n'</code>, `'foo'] This is fine for such a simple example, but it is easy to see how it could become unwieldy with a much larger expression (and yes, this <code>foo</code> thing is a stand in for the larger expression I am actually using). <hr> Interesting aside: The closest SO question I could find to my problem was this one: In regex, match either the end of the string or a specific character Here, I could simply substitute <code>\n</code> for my 'specific character'. Now, the accepted answer uses the regex <code>/(&|\?)list=.*?(&|$)/</code>. I notice that the OP was using JavaScript (question was tagged with the <code>javascript</code> tag), so maybe the JavaScript regex interpreter is different, but when I use the exact strings given in the question with the above regex in Python, I get bad results: <pre class="prettyprint"><code>>>> findall("(&|\?)list=.*?(&|$)", "index.php?test=1&list=UL") [('&', '')] >>> findall("(&|\?)list=.*?(&|$)", "index.php?list=UL&more=1") [('?', '&')] </code></pre> So, I'm stumped.

You could use <code>re.MULTILINE</code> and include an optional linebreak after the <code>$</code> in your pattern: <pre class="prettyprint"><code>s = "foo\nbar\nfood\nfoo" pattern = re.compile('foo$\n?', re.MULTILINE) print re.findall(pattern, s) # -> ['foo\n', 'foo'] </code></pre>

Can't find the correct regex syntax to match newline or end of string

Tags:

python

regex

newline

This feels like a really simple question, but I can't find the answer anywhere.

(Notes: I'm using Python, but this shouldn't matter.)

Say I have the following string:

s = "foo\nbar\nfood\nfoo"

I am simply trying to find a regex that will match both instances of "foo", but not "food", based on the fact that the "foo" in "food" is not immediately followed by either a newline or the end of the string.

This is perhaps an overly complicated way to express my question, but it gives something concrete to work with.

Here are some of the things I have tried, with results (Note: the result I want is [foo\n, foo]):

foo[\n\Z] => ['foo\n']

foo(\n\Z) => ['\n', ''] <= This seems to match the newline and EOS, but not the foo

foo($|\n) => ['\n', '']

(foo)($|\n) => [(foo,'\n'), (foo,'')] <= Almost there, and this is a useable plan B, but I would like to find the perfect solution.

The only thing I found that does work is:

foo$|foo\n => ['foo\n', `'foo']

This is fine for such a simple example, but it is easy to see how it could become unwieldy with a much larger expression (and yes, this foo thing is a stand in for the larger expression I am actually using).

Interesting aside: The closest SO question I could find to my problem was this one: In regex, match either the end of the string or a specific character

Here, I could simply substitute \n for my 'specific character'. Now, the accepted answer uses the regex /(&|\?)list=.*?(&|$)/. I notice that the OP was using JavaScript (question was tagged with the javascript tag), so maybe the JavaScript regex interpreter is different, but when I use the exact strings given in the question with the above regex in Python, I get bad results:

>>> findall("(&|\?)list=.*?(&|$)", "index.php?test=1&list=UL")
[('&', '')]
>>> findall("(&|\?)list=.*?(&|$)", "index.php?list=UL&more=1")
[('?', '&')]

So, I'm stumped.

732

asked Dec 31 '12 16:12

Ken Bellows

1 Answers

You could use re.MULTILINE and include an optional linebreak after the $ in your pattern:

s = "foo\nbar\nfood\nfoo"
pattern = re.compile('foo$\n?', re.MULTILINE)
print re.findall(pattern, s)
# -> ['foo\n', 'foo']

166

answered Oct 20 '22 18:10

omz

Related questions
                            
                                Vectorized moving window on 2D array in numpy
                            
                                Python library for animated map visualization [closed]
                            
                                Function failed: Raise Exception, or return FALSE? What's the better approach?
                            
                                Why is __init__ not called after __new__ SOMETIMES
                            
                                How does argparse (and the deprecated optparse) respond to 'tab' keypress after python program name, in bash?
                            
                                Numpy: Sorting a multidimensional array by a multidimensional array
                            
                                Can I define optional packages in setuptools?
                            
                                python generator with check for empty condition
                            
                                Python with statement in C++
                            
                                python 3 types module
                            
                                Show an array in format of scientific notation
                            
                                Loading Large File in Python
                            
                                Secure static files with flask [duplicate]
                            
                                How to execute code only on test failures with python unittest2?
                            
                                Multiple conditions with if/elif statements [duplicate]
                            
                                Python unicode regular expression matching failing with some unicode characters -bug or mistake?
                            
                                parsing excel style formula
                            
                                What is the workflow for a secure 'verify by email' system?
                            
                                how to get (txt) file content from FileField?
                            
                                How to check if an array is 2D

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With