Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Shelve Module Memory Consumption

I have been assigned the task of reading a .txt file which is a log of various events and writing some of those events into a dictionary.

The problem is that the file can sometimes get bigger than 3GB in size. This means that the dictionary gets too big to fit into main memory. It seems that Shelve is a good way to solve this problem. However, since I will be constantly modifying the dictionary, I must have the writeback option enabled. This is where I am concerned - the tutorial says that this would slow down the read/write process and use more memory, but I am unable to find statistics on how the speed and memory are affected.

Can anyone clarify by how much the read/write speed and memory are affected so that I can decide whether to use the writeback option or sacrifice some readability for code efficiency?

Thank you

like image 389
inspectorG4dget Avatar asked May 24 '11 18:05

inspectorG4dget


1 Answers

For databases this size, shelve really is the wrong tool. If you do not need a highly available client/server architecture, and you just want to convert your TXT file to a local in-memory-accessible database, you really should be using ZODB

If you need something highly-available, you will of course need to switch to a formal "NoSQL" database, of which there are many to choose from.

Here's a simple example of how to convert your shelve database to a ZODB database which will solve your memory usage / performance problems.

#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re

reload(sys)
sys.setdefaultencoding("utf-8")

parser = OptionParser()

parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")

parser.set_defaults()
options, args = parser.parse_args()

if options.in_file == False or options.out_file == False :
    print "Need input and output database filenames"
    exit(1)

db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()

for key, value in db.iteritems() :
    print "Copying key: " + str(key)
    newdb[key] = value
                                                                                                                                                                                                
transaction.commit() 
like image 67
Michael Galaxy Avatar answered Sep 23 '22 01:09

Michael Galaxy