Have:
f = open(...)
r = re.compile(...)
Need:
Find the position (start and end) of a first matching regexp in a big file?
(starting from current_pos=...
)
How can I do this?
I want to have this function:
def find_first_regex_in_file(f, regexp, start_pos=0):
f.seek(start_pos)
.... (searching f for regexp starting from start_pos) HOW?
return [match_start, match_end]
File 'f' is expected to be big.
write() function and close the file. Define a pattern which you want to find inside the file. Now, open the file in the read form. Use the for loop, and inside that, use the re.search() method to find the pattern and if it searches the match, then print the output.
re.search is used to search, in this case for the string dream in string line . If the text is found, then re.search returns True , else it returns False . Note that we put an r in front of the search string. This is to tell Python that this is a raw string which should not be escaped (more about this later..)
One way to search through big files is to use the mmap
library to map the file into a big memory chunk. Then you can search through it without having to explicitly read it.
For example, something like:
size = os.stat(fn).st_size
f = open(fn)
data = mmap.mmap(f.fileno(), size, access=mmap.ACCESS_READ)
m = re.search(r"867-?5309", data)
This works well for very big files (I've done it for a file 30+ GB in size, but you'll need a 64-bit OS if your file is more than a GB or two).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With