Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match line beginning and end in a multi-line string

I would like to match entire line in a multi-line string (this code is part of unit test that checks the correct output format).

Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.match(r".*score = 0\.59.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
<_sre.SRE_Match object; span=(0, 39), match='score = 0.65\\nscore = 0.59\\nscore = 1.0'>

This works fine, i can match anything within multiline string. However, i would like to make sure that i match entire line. The documentation sais that the ^ and $ should match the beginning and end of line when re.MULTILINE is used. However, this somehow does not work for me:

>>> re.match(r".*^score = 0\.59$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

Here are a few more experiments i made:

>>> import os
>>> re.match(r".*^score = 0\.59$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
>>>
>>> re.match(r".*^score = 0\.65$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>
>>> re.match(r".*^score = 0\.65$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

I guess i'm missing something rather simple, but couldn't figure that out.

like image 794
k6ps Avatar asked Sep 12 '25 16:09

k6ps


1 Answers

problem is that since you're using raw strings for your string, \n is seen as ... well \ then n. Regexes will understand \n in the pattern, but not in the input string.

Also, even if not important there, always use flags= keyword, as some regex functions have an extra count parameter and that can lead to errors.

like this:

re.match(r".*^score = 0\.65$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>

and as I noted in comments, .* needs re.DOTALL to match newlines

>>> re.match(r".*^score = \d+\.\d+$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object; span=(0, 37), match='score = 0.65\nscore = 0.59\nscore = 1.0'>

(as noted in Python regex, matching pattern over multiple lines.. why isn't this working? and How do I match any character across multiple lines in a regular expression? of which this could be a duplicate if it wasn't for the raw string bit)

(sorry, my floating point regex is probably a bit weak, you can find better ones around)

like image 70
Jean-François Fabre Avatar answered Sep 14 '25 07:09

Jean-François Fabre