Python big file parsing with re

Tags:

How to parse a large file with regular expressions (using the re module), without loading the whole file in string (or memory)? Memory mapped files don't help because their content can't be converted to some kind of lazy string. The re module only supports string as content argument.

#include <boost/format.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/regex.hpp>
#include <iostream>

int main(int argc, char* argv[])
{
    boost::iostreams::mapped_file fl("BigFile.log");
    //boost::regex expr("\\w+>Time Elapsed .*?$", boost::regex::perl);
    boost::regex expr("something usefull");
    boost::match_flag_type flags = boost::match_default;
    boost::iostreams::mapped_file::iterator start, end;
    start = fl.begin();
    end = fl.end();
    boost::match_results<boost::iostreams::mapped_file::iterator> what;
    while(boost::regex_search(start, end, what, expr))
    {
        std::cout<<what[0].str()<<std::endl;
        start = what[0].second;
    }
    return 0;
}

To demonstrate my requirements. I wrote a short sample using C++(and boost) the same I want to have in Python.

349

asked Jul 26 '12 17:07

Alex

1 Answers

Everything now works ok(Python 3.2.3 has some differences with Python 2.7 in interface). Search patter should be just prefixed with b" to have a working solution(in Python 3.2.3).

import re
import mmap
import pprint

def ParseFile(fileName):
    f = open(fileName, "r")
    print("File opened succesfully")
    m = mmap.mmap(f.fileno(), 0, access = mmap.ACCESS_READ)
    print("File mapped succesfully")
    items = re.finditer(b"\\w+>Time Elapsed .*?\n", m)
    for item in items:
        pprint.pprint(item.group(0))

if __name__ == "__main__":
    ParseFile("testre")

178

answered Oct 26 '22 21:10

Alex

Related questions
                            
                                Python algorithm of counting occurrence of specific word in csv
                            
                                Source code for Python's modules
                            
                                Count occurrences of a couple of specific words
                            
                                Calculate Hitting Time between 2 nodes using NetworkX
                            
                                python - beginner - integrating optparse in a program
                            
                                Why can't I add a tuple to a list with the '+' operator in Python?
                            
                                how to convert raw images to png in python?
                            
                                Creating list from retrlines in Python
                            
                                Is there a way to get an item from a set in O(1) time? [duplicate]
                            
                                complex eigen values in PCA calculation
                            
                                Django: Is separating views.py into its own module a good idea?
                            
                                Finding unusual value in an array, list
                            
                                Can django-pagination do multiple paginations per page?
                            
                                django: How do I hash a URL from the database object's primary key?
                            
                                Comparing numpy datatypes to strings
                            
                                How do I get linux to automatically run my python script in the Python interpreter?
                            
                                Python DB API list tables
                            
                                randomly choose 100 documents under a directory
                            
                                Python Glob.glob: a wildcard for the number of directories between the root and the destination
                            
                                Call methods by string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python big file parsing with re

Tags:

python

regex

file

Alex

People also ask

1 Answers

Alex

Recent Activity

Donate For Us