Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a python module for regex matching in zip files

I have over a million text files compressed into 40 zip files. I also have a list of about 500 model names of phones. I want to find out the number of times a particular model was mentioned in the text files.

Is there any python module which can do a regex match on the files without unzipping it. Is there a simple way to solve this problem without unzipping?

like image 354
cnu Avatar asked Aug 18 '08 07:08

cnu


People also ask

Which of the following is the module in Python for working with ZIP files?

To work on zip files using python, we will use an inbuilt python module called zipfile.

How do I use RegEx modules in Python?

Python has a module named re to work with RegEx. Here's an example: import re pattern = '^a...s$' test_string = 'abyss' result = re. match(pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.")

Which module is used to work with RegEx?

RegEx Module Python has a built-in package called re , which can be used to work with Regular Expressions.

Does Python match RegEx?

The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, the Python RegEx Match function returns null.


1 Answers

There's nothing that will automatically do what you want.

However, there is a python zipfile module that will make this easy to do. Here's how to iterate over the lines in the file.

#!/usr/bin/python

import zipfile
f = zipfile.ZipFile('myfile.zip')

for subfile in f.namelist():
    print subfile
    data = f.read(subfile)
    for line in data.split('\n'):
        print line
like image 65
Mark Harrison Avatar answered Nov 03 '22 01:11

Mark Harrison