How to solve the memory error in Python

Tags:

memory

I am dealing with several large txt file, each of them has about 8000000 lines. A short example of the lines are:

usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty

The code to store them in a dictionary is:

dicCSK = collections.defaultdict(list)
for line in finCSK:
    line=line.strip('\n')
    try:
        r, c1, c2 = line.split(" ")
    except ValueError:
        print line
    dicCSK[c1].append(r+" "+c2)

It runs good in the first txt file, but when it runs to the second txt file, I got an error MemoryError.

I am using window 7 64bit with python 2.7 32bit, intel i5 cpu, with 8Gb memory. How can I solve the problem?

Further explaining: I have four large files, each file contains different information for many entities. For example, I want to find all information for cat, its father node animal and its child node persian cat and so on. So my program first read all txt files in the dictionary, then I scan all dictionaries to find information for cat and its father and its children.

946

asked May 20 '16 12:05

flyingmouse

2 Answers

Simplest solution: You're probably running out of virtual address space (any other form of error usually means running really slowly for a long time before you finally get a MemoryError). This is because a 32 bit application on Windows (and most OSes) is limited to 2 GB of user mode address space (Windows can be tweaked to make it 3 GB, but that's still a low cap). You've got 8 GB of RAM, but your program can't use (at least) 3/4 of it. Python has a fair amount of per-object overhead (object header, allocation alignment, etc.), odds are the strings alone are using close to a GB of RAM, and that's before you deal with the overhead of the dictionary, the rest of your program, the rest of Python, etc. If memory space fragments enough, and the dictionary needs to grow, it may not have enough contiguous space to reallocate, and you'll get a MemoryError.

Install a 64 bit version of Python (if you can, I'd recommend upgrading to Python 3 for other reasons); it will use more memory, but then, it will have access to a lot more memory space (and more physical RAM as well).

If that's not enough, consider converting to a sqlite3 database (or some other DB), so it naturally spills to disk when the data gets too large for main memory, while still having fairly efficient lookup.

111

answered Sep 30 '22 00:09

ShadowRanger

Assuming your example text is representative of all the text, one line would consume about 75 bytes on my machine:

In [3]: sys.getsizeof('usedfor zipper fasten_coat')
Out[3]: 75

Doing some rough math:

75 bytes * 8,000,000 lines / 1024 / 1024 = ~572 MB

So roughly 572 meg to store the strings alone for one of these files. Once you start adding in additional, similarly structured and sized files, you'll quickly approach your virtual address space limits, as mentioned in @ShadowRanger's answer.

If upgrading your python isn't feasible for you, or if it only kicks the can down the road (you have finite physical memory after all), you really have two options: write your results to temporary files in-between loading in and reading the input files, or write your results to a database. Since you need to further post-process the strings after aggregating them, writing to a database would be the superior approach.

answered Sep 30 '22 00:09

Levi Noecker

Related questions
                            
                                Why is i++++++++i valid in python?
                            
                                csrf error in django
                            
                                need the average from a list of timedelta objects
                            
                                How can I close an image shown to the user with the Python Imaging Library?
                            
                                Saving a Numpy array as an image (instructions)
                            
                                Python FTP get the most recent file by date
                            
                                How can I flip an image along the vertical axis with python? [closed]
                            
                                Python combine two for loops
                            
                                ordering shuffled points that can be joined to form a polygon (in python)
                            
                                python regex findall and multiline
                            
                                from . import * from module
                            
                                UnicodeDecodeError in Python 3 when importing a CSV file
                            
                                in pandas how can I groupby weekday() for a datetime column?
                            
                                How to route a chain of tasks to a specific queue in celery?
                            
                                convert selected datetime to date in sqlalchemy
                            
                                Reverse Indexing in Python?
                            
                                Group dictionary key values in python
                            
                                django admin inline many to many custom fields
                            
                                python find difference between two lists [duplicate]
                            
                                Geometric median of multidimensional points

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With