Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do the objects passed to function as arguments, get duplicated

I am using python to do some analysis on certain datasets and this process generates huge lists/dictionaries which maximally consume upto 30% (as reported by top) of RAM (24GB). There are ~400 such data files and each has to be processed. Therefore I cannot run more than two jobs at a time (otherwise my system hangs). Finishing the analysis of each file takes few minutes and the entire data takes close to two days.

The only solution is to use parallel processing and to implement it i need to create functions that will execute the tasks.

The first step remains the same- open the file, read, split and store as a list. Usually I do the analysis on the list- get another list and then delete the previous one to save memory. However, if I use multiprocessing I would have to pass this list as an argument to some function.

  1. Will this duplicate the list i.e consume twice the memory ?
  2. Is it possible to delete the original variable after it is passed to a function, from within the function? Is making the variable global a possible way ?
  3. Is there any other way to save memory in this case ?

Example:

# OPEN FILE #
f=open(args.infile,'r')
a=f.read()
f.close()
mall=findall('[^%]+',a)
del a
lm=len(mall)
m=[]
for i in range(args.numcores):
    if i<args.numcores-1:
        m[i]=mall[i*args.numcores:(i+1)*args.numcores]
    else:
        m[i]=mall[i*args.numcores:lm]
del mall

then pass it to a function fun(<list>)

In this case for each process: fun(m[i])

like image 783
WYSIWYG Avatar asked Mar 22 '23 03:03

WYSIWYG


1 Answers

No, there's no copy made of the object. Parameters passed to a function reference the same object as the caller.

Deleting the variable within the function won't help, since there's still a reference at the calling site. Garbage collection won't occur until all references are gone.

like image 70
Mark Ransom Avatar answered Apr 26 '23 05:04

Mark Ransom