Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Upper memory limit?

Tags:

python

memory

Is there a limit to memory for python? I've been using a python script to calculate the average values from a file which is a minimum of 150mb big.

Depending on the size of the file I sometimes encounter a MemoryError.

Can more memory be assigned to the python so I don't encounter the error?


EDIT: Code now below

NOTE: The file sizes can vary greatly (up to 20GB) the minimum size of the a file is 150mb

file_A1_B1 = open("A1_B1_100000.txt", "r") file_A2_B2 = open("A2_B2_100000.txt", "r") file_A1_B2 = open("A1_B2_100000.txt", "r") file_A2_B1 = open("A2_B1_100000.txt", "r") file_write = open ("average_generations.txt", "w") mutation_average = open("mutation_average", "w")  files = [file_A2_B2,file_A2_B2,file_A1_B2,file_A2_B1]  for u in files:     line = u.readlines()     list_of_lines = []     for i in line:         values = i.split('\t')         list_of_lines.append(values)      count = 0     for j in list_of_lines:         count +=1      for k in range(0,count):         list_of_lines[k].remove('\n')      length = len(list_of_lines[0])     print_counter = 4      for o in range(0,length):         total = 0         for p in range(0,count):             number = float(list_of_lines[p][o])             total = total + number         average = total/count         print average         if print_counter == 4:             file_write.write(str(average)+'\n')             print_counter = 0         print_counter +=1 file_write.write('\n') 
like image 961
Harpal Avatar asked Nov 26 '10 12:11

Harpal


People also ask

What are memory limits?

Limits on memory and address space vary by platform and operating system. Limits on physical memory for 32-bit platforms also depend on the presence and use of Physical Address Extension (PAE), which allows 32-bit systems to use more than 4 GB of physical memory.

How many GB of RAM is maximum?

Similar to a Windows-based computer, Linux-based machines' maximum RAM is based on whether they have 32-bit or 64-bit architecture. Most 32-bit Linux systems only support 4 GB of RAM, unless the PAE kernel is enabled, which allows a 64 GB max. However, 64-bit variants support between 1 and 256 TB.

What is mean by upper memory area and high memory area?

Short for Upper Memory Area, UMA is the area of RAM between 640 KB and 1,024 KB (1 MB) in legacy computers, that is made available to user applications as RAM. In DOS based systems, memory is split into five areas: conventional memory; upper memory; high memory; extended memory; and expanded memory.

What is Max RAM for 64-bit?

The theoretical memory limit that a 64-bit computer can address is about 16 exabytes (16 billion gigabytes), Windows XP x64 is currently limited to 128 GB of physical memory and 8 TB of virtual memory.


2 Answers

(This is my third answer because I misunderstood what your code was doing in my original, and then made a small but crucial mistake in my second—hopefully three's a charm.

Edits: Since this seems to be a popular answer, I've made a few modifications to improve its implementation over the years—most not too major. This is so if folks use it as template, it will provide an even better basis.

As others have pointed out, your MemoryError problem is most likely because you're attempting to read the entire contents of huge files into memory and then, on top of that, effectively doubling the amount of memory needed by creating a list of lists of the string values from each line.

Python's memory limits are determined by how much physical ram and virtual memory disk space your computer and operating system have available. Even if you don't use it all up and your program "works", using it may be impractical because it takes too long.

Anyway, the most obvious way to avoid that is to process each file a single line at a time, which means you have to do the processing incrementally.

To accomplish this, a list of running totals for each of the fields is kept. When that is finished, the average value of each field can be calculated by dividing the corresponding total value by the count of total lines read. Once that is done, these averages can be printed out and some written to one of the output files. I've also made a conscious effort to use very descriptive variable names to try to make it understandable.

try:     from itertools import izip_longest except ImportError:    # Python 3     from itertools import zip_longest as izip_longest  GROUP_SIZE = 4 input_file_names = ["A1_B1_100000.txt", "A2_B2_100000.txt", "A1_B2_100000.txt",                     "A2_B1_100000.txt"] file_write = open("average_generations.txt", 'w') mutation_average = open("mutation_average", 'w')  # left in, but nothing written  for file_name in input_file_names:     with open(file_name, 'r') as input_file:         print('processing file: {}'.format(file_name))          totals = []         for count, fields in enumerate((line.split('\t') for line in input_file), 1):             totals = [sum(values) for values in                         izip_longest(totals, map(float, fields), fillvalue=0)]         averages = [total/count for total in totals]          for print_counter, average in enumerate(averages):             print('  {:9.4f}'.format(average))             if print_counter % GROUP_SIZE == 0:                 file_write.write(str(average)+'\n')  file_write.write('\n') file_write.close() mutation_average.close() 
like image 74
martineau Avatar answered Oct 16 '22 12:10

martineau


You're reading the entire file into memory (line = u.readlines()) which will fail of course if the file is too large (and you say that some are up to 20 GB), so that's your problem right there.

Better iterate over each line:

for current_line in u:     do_something_with(current_line) 

is the recommended approach.

Later in your script, you're doing some very strange things like first counting all the items in a list, then constructing a for loop over the range of that count. Why not iterate over the list directly? What is the purpose of your script? I have the impression that this could be done much easier.

This is one of the advantages of high-level languages like Python (as opposed to C where you do have to do these housekeeping tasks yourself): Allow Python to handle iteration for you, and only collect in memory what you actually need to have in memory at any given time.

Also, as it seems that you're processing TSV files (tabulator-separated values), you should take a look at the csv module which will handle all the splitting, removing of \ns etc. for you.

like image 23
Tim Pietzcker Avatar answered Oct 16 '22 11:10

Tim Pietzcker