I have over a million text files compressed into 40 zip files. I also have a list of about 500 model names of phones. I want to find out the number of times a particular model was mentioned in the text files.
Is there any python module which can do a regex match on the files without unzipping it. Is there a simple way to solve this problem without unzipping?
To work on zip files using python, we will use an inbuilt python module called zipfile.
Python has a module named re to work with RegEx. Here's an example: import re pattern = '^a...s$' test_string = 'abyss' result = re. match(pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.")
RegEx Module Python has a built-in package called re , which can be used to work with Regular Expressions.
The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, the Python RegEx Match function returns null.
There's nothing that will automatically do what you want.
However, there is a python zipfile module that will make this easy to do. Here's how to iterate over the lines in the file.
#!/usr/bin/python
import zipfile
f = zipfile.ZipFile('myfile.zip')
for subfile in f.namelist():
print subfile
data = f.read(subfile)
for line in data.split('\n'):
print line
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With