Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: Searching A Large Text File With REGEX

I wish to search a large text file with regex and have set-up the following code:

import re

regex = input("REGEX: ")

SearchFunction = re.compile(regex)

f = open('data','r', encoding='utf-8')

result = re.search(SearchFunction, f)

print(result.groups())

f.close()

Of course, this doesn't work because the second argument for re.search should be a string or buffer. However, I cannot insert all of my text file into a string as it is too long (meaning that it would take forever). What is the alternative?

like image 992
Eden Crow Avatar asked Mar 03 '12 12:03

Eden Crow


People also ask

How do I search for a pattern within a text file using Python?

findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.

How do I search for large files?

Open File Explorer and navigate to This PC or the drive you wish to search. In the search field, type size: gigantic and then press Enter. It will search for any files larger than 128 MB. Click the View tab, then select Details.


2 Answers

You check if the pattern matches for each line. This won't load the entire file to the memory:

for line in f:
    result = re.search(SearchFunction, line)
like image 80
Mariusz Jamro Avatar answered Sep 29 '22 13:09

Mariusz Jamro


You can use a memory-mapped file with the mmap module. Think of it as a file pretending to be a string (or the opposite of a StringIO). You can find an example in this Python Module of the Week article about mmap by Doug Hellman.

like image 38
Steven Avatar answered Sep 29 '22 13:09

Steven