Upper memory limit?

Tags:

2 Answers

(This is my third answer because I misunderstood what your code was doing in my original, and then made a small but crucial mistake in my second—hopefully three's a charm.

Edits: Since this seems to be a popular answer, I've made a few modifications to improve its implementation over the years—most not too major. This is so if folks use it as template, it will provide an even better basis.

As others have pointed out, your MemoryError problem is most likely because you're attempting to read the entire contents of huge files into memory and then, on top of that, effectively doubling the amount of memory needed by creating a list of lists of the string values from each line.

Python's memory limits are determined by how much physical ram and virtual memory disk space your computer and operating system have available. Even if you don't use it all up and your program "works", using it may be impractical because it takes too long.

Anyway, the most obvious way to avoid that is to process each file a single line at a time, which means you have to do the processing incrementally.

To accomplish this, a list of running totals for each of the fields is kept. When that is finished, the average value of each field can be calculated by dividing the corresponding total value by the count of total lines read. Once that is done, these averages can be printed out and some written to one of the output files. I've also made a conscious effort to use very descriptive variable names to try to make it understandable.

try:     from itertools import izip_longest except ImportError:    # Python 3     from itertools import zip_longest as izip_longest  GROUP_SIZE = 4 input_file_names = ["A1_B1_100000.txt", "A2_B2_100000.txt", "A1_B2_100000.txt",                     "A2_B1_100000.txt"] file_write = open("average_generations.txt", 'w') mutation_average = open("mutation_average", 'w')  # left in, but nothing written  for file_name in input_file_names:     with open(file_name, 'r') as input_file:         print('processing file: {}'.format(file_name))          totals = []         for count, fields in enumerate((line.split('\t') for line in input_file), 1):             totals = [sum(values) for values in                         izip_longest(totals, map(float, fields), fillvalue=0)]         averages = [total/count for total in totals]          for print_counter, average in enumerate(averages):             print('  {:9.4f}'.format(average))             if print_counter % GROUP_SIZE == 0:                 file_write.write(str(average)+'\n')  file_write.write('\n') file_write.close() mutation_average.close()

answered Oct 16 '22 12:10

martineau

You're reading the entire file into memory (line = u.readlines()) which will fail of course if the file is too large (and you say that some are up to 20 GB), so that's your problem right there.

Better iterate over each line:

for current_line in u:     do_something_with(current_line)

is the recommended approach.

Later in your script, you're doing some very strange things like first counting all the items in a list, then constructing a for loop over the range of that count. Why not iterate over the list directly? What is the purpose of your script? I have the impression that this could be done much easier.

This is one of the advantages of high-level languages like Python (as opposed to C where you do have to do these housekeeping tasks yourself): Allow Python to handle iteration for you, and only collect in memory what you actually need to have in memory at any given time.

Also, as it seems that you're processing TSV files (tabulator-separated values), you should take a look at the csv module which will handle all the splitting, removing of \ns etc. for you.

answered Oct 16 '22 11:10

Tim Pietzcker

Related questions
                            
                                How to write data to Redshift that is a result of a dataframe created in Python?
                            
                                Multiprocessing: use only the physical cores?
                            
                                pymongo : delete records elegantly
                            
                                executing Python script in PHP and exchanging data between the two
                            
                                How to set a Python variable to 'undefined'?
                            
                                No module named 'winrandom' when using pycrypto
                            
                                python csv, writing headers only once
                            
                                How to use TailwindCSS with Django?
                            
                                Sweave for python
                            
                                How to get the duration of video using cv2
                            
                                Beautiful Soup to parse url to get another urls data
                            
                                Pythonic Circular List
                            
                                Nested dictionary comprehension python
                            
                                DBSCAN for clustering of geographic location data
                            
                                Docker Kafka w/ Python consumer
                            
                                How to make Django template engine to render in memory templates?
                            
                                python selenium, find out when a download has completed?
                            
                                How to create random orthonormal matrix in python numpy
                            
                                Easiest way to turn a list into an HTML table in python?
                            
                                Is it possible to change an instance's method implementation without changing all other instances of the same class? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Upper memory limit?

Tags:

python

memory

Harpal

People also ask

2 Answers

martineau

Tim Pietzcker

Recent Activity

Donate For Us