Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex, match in multiline, but still want to get the line number

I have lots of log files, and want to search some patterns using multiline, but in order to locate matched string easily, I still want to see the line number for matched area.

Any good suggestion. (code sample is copied)

string="""
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2
"""

import re
pattern = '.*?####(.*?)####'
matches= re.compile(pattern, re.MULTILINE|re.DOTALL).findall(string)
for item in matches:
    print "lineno: ?", "matched: ", item

[UPDATE] the lineno is the actual line number

So the output I want looks like:

    lineno: 1, 1
    ttteest
    lineno: 6, 2
    ttttteeeestt
like image 830
Larry Cai Avatar asked May 21 '13 15:05

Larry Cai


People also ask

How do you match multiple lines in Python?

The re. MULTILINE flag tells python to make the '^' and '$' special characters match the start or end of any line within a string. Using this flag: >>> match = re.search(r'^It has.

How do you match line breaks in regex?

If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.


2 Answers

What you want is a typical task that regex is not very good at; parsing.

You could read the logfile line by line, and search that line for the strings you are using to delimit your search. You could use regex line by line, but it is less efficient than regular string matching unless you are looking for complicated patterns.

And if you are looking for complicated matches, I'd like to see it. Searching every line in a file for #### while maintaining the line count is easier without regex.

like image 136
melwil Avatar answered Sep 19 '22 05:09

melwil


You can store the line numbers before hand only and afterwards look for it.

import re

string="""
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2
"""

end='.*\n'
line=[]
for m in re.finditer(end, string):
    line.append(m.end())

pattern = '.*?####(.*?)####'
match=re.compile(pattern, re.MULTILINE|re.DOTALL)
for m in re.finditer(match, string):
    print 'lineno :%d, %s' %(next(i for i in range(len(line)) if line[i]>m.start(1)), m.group(1))
like image 21
himanshu shekhar Avatar answered Sep 18 '22 05:09

himanshu shekhar