How do I re.search or re.match on a whole file without reading it all into memory?

Tags:

I want to be able to run a regular expression on an entire file, but I'd like to be able to not have to read the whole file into memory at once as I may be working with rather large files in the future. Is there a way to do this? Thanks!

Clarification: I cannot read line-by-line because it can span multiple lines.

886

asked Jan 18 '09 01:01

Evan Fosmark

1 Answers

You can use mmap to map the file to memory. The file contents can then be accessed like a normal string:

import re, mmap  with open('/var/log/error.log', 'r+') as f:   data = mmap.mmap(f.fileno(), 0)   mo = re.search('error: (.*)', data)   if mo:     print "found error", mo.group(1)

This also works for big files, the file content is internally loaded from disk as needed.

107

answered Sep 21 '22 23:09

sth

Related questions
                            
                                Pytest fixture for a class through self not as method argument
                            
                                Comparison of IntelliJ Python plugin or PyCharm
                            
                                Recursively compare two directories to ensure they have the same files and subdirectories
                            
                                Open file by filename wildcard
                            
                                How to do a HTTP DELETE request with Requests library
                            
                                Adding a jQuery script to the Django admin interface
                            
                                Find position of maximum per unique bin (binargmax)
                            
                                Multiline log records in syslog
                            
                                What is the difference between ProcessPoolExecutor and ThreadPoolExecutor?
                            
                                raw_input and timeout [duplicate]
                            
                                How to get image size (bytes) using PIL
                            
                                build scipy error cythonize failed
                            
                                What's the best way to tell if a Python program has anything to read from stdin?
                            
                                Pass Variable On Import
                            
                                Sort list of date strings
                            
                                Python fromtimestamp OSError
                            
                                How to use tqdm with pandas in a jupyter notebook?
                            
                                How do I force Python to be 32-bit on Snow Leopard and other 32-bit/64-bit questions
                            
                                Finding the user's "My Documents" path
                            
                                Retrieving binary file content using Javascript, base64 encode it and reverse-decode it using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I re.search or re.match on a whole file without reading it all into memory?

Tags:

performance

python

regex

file

Evan Fosmark

People also ask

1 Answers

sth

Recent Activity

Donate For Us