Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I re.search or re.match on a whole file without reading it all into memory?

I want to be able to run a regular expression on an entire file, but I'd like to be able to not have to read the whole file into memory at once as I may be working with rather large files in the future. Is there a way to do this? Thanks!

Clarification: I cannot read line-by-line because it can span multiple lines.

like image 886
Evan Fosmark Avatar asked Jan 18 '09 01:01

Evan Fosmark


People also ask

What does re search return if no match?

The re.search() function takes two parameters and returns a match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned. If no matches are found, the value None is returned.

How do I get match value from re search?

You can use the start() and end() methods on the returned match objects to get the correct positions within the string: for note in df["person_notes"]: match = re.search(r'\d+', note) if match: print(note[match.


1 Answers

You can use mmap to map the file to memory. The file contents can then be accessed like a normal string:

import re, mmap  with open('/var/log/error.log', 'r+') as f:   data = mmap.mmap(f.fileno(), 0)   mo = re.search('error: (.*)', data)   if mo:     print "found error", mo.group(1) 

This also works for big files, the file content is internally loaded from disk as needed.

like image 107
sth Avatar answered Sep 21 '22 23:09

sth