I wish to search a large text file with regex and have set-up the following code:
import re
regex = input("REGEX: ")
SearchFunction = re.compile(regex)
f = open('data','r', encoding='utf-8')
result = re.search(SearchFunction, f)
print(result.groups())
f.close()
Of course, this doesn't work because the second argument for re.search
should be a string or buffer. However, I cannot insert all of my text file into a string as it is too long (meaning that it would take forever). What is the alternative?
findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
Open File Explorer and navigate to This PC or the drive you wish to search. In the search field, type size: gigantic and then press Enter. It will search for any files larger than 128 MB. Click the View tab, then select Details.
You check if the pattern matches for each line. This won't load the entire file to the memory:
for line in f:
result = re.search(SearchFunction, line)
You can use a memory-mapped file with the mmap module. Think of it as a file pretending to be a string (or the opposite of a StringIO). You can find an example in this Python Module of the Week article about mmap by Doug Hellman.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With