I want to be able to run a regular expression on an entire file, but I'd like to be able to not have to read the whole file into memory at once as I may be working with rather large files in the future. Is there a way to do this? Thanks!
Clarification: I cannot read line-by-line because it can span multiple lines.
The re.search() function takes two parameters and returns a match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned. If no matches are found, the value None is returned.
You can use the start() and end() methods on the returned match objects to get the correct positions within the string: for note in df["person_notes"]: match = re.search(r'\d+', note) if match: print(note[match.
You can use mmap to map the file to memory. The file contents can then be accessed like a normal string:
import re, mmap with open('/var/log/error.log', 'r+') as f: data = mmap.mmap(f.fileno(), 0) mo = re.search('error: (.*)', data) if mo: print "found error", mo.group(1)
This also works for big files, the file content is internally loaded from disk as needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With