Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: find regexp in a file

Tags:

python

regex

Have:

f = open(...)  
r = re.compile(...)

Need:
Find the position (start and end) of a first matching regexp in a big file?
(starting from current_pos=...)

How can I do this?


I want to have this function:

def find_first_regex_in_file(f, regexp, start_pos=0):  
   f.seek(start_pos)  

   .... (searching f for regexp starting from start_pos) HOW?  

   return [match_start, match_end]  

File 'f' is expected to be big.

like image 336
Sergey Avatar asked Feb 14 '11 05:02

Sergey


People also ask

How do you find a pattern in a file using Python?

write() function and close the file. Define a pattern which you want to find inside the file. Now, open the file in the read form. Use the for loop, and inside that, use the re.search() method to find the pattern and if it searches the match, then print the output.

How do you use re in a text file in Python?

re.search is used to search, in this case for the string dream in string line . If the text is found, then re.search returns True , else it returns False . Note that we put an r in front of the search string. This is to tell Python that this is a raw string which should not be escaped (more about this later..)


1 Answers

One way to search through big files is to use the mmap library to map the file into a big memory chunk. Then you can search through it without having to explicitly read it.

For example, something like:

size = os.stat(fn).st_size
f = open(fn)
data = mmap.mmap(f.fileno(), size, access=mmap.ACCESS_READ)

m = re.search(r"867-?5309", data)

This works well for very big files (I've done it for a file 30+ GB in size, but you'll need a 64-bit OS if your file is more than a GB or two).

like image 79
Greg Hewgill Avatar answered Sep 20 '22 12:09

Greg Hewgill