Do the objects passed to function as arguments, get duplicated

Question

I am using python to do some analysis on certain datasets and this process generates huge lists/dictionaries which maximally consume upto 30% (as reported by top) of RAM (24GB). There are ~400 such data files and each has to be processed. Therefore I cannot run more than two jobs at a time (otherwise my system hangs). Finishing the analysis of each file takes few minutes and the entire data takes close to two days.

The only solution is to use parallel processing and to implement it i need to create functions that will execute the tasks.

The first step remains the same- open the file, read, split and store as a list. Usually I do the analysis on the list- get another list and then delete the previous one to save memory. However, if I use multiprocessing I would have to pass this list as an argument to some function.

Will this duplicate the list i.e consume twice the memory ?
Is it possible to delete the original variable after it is passed to a function, from within the function? Is making the variable global a possible way ?
Is there any other way to save memory in this case ?

Example:

# OPEN FILE #
f=open(args.infile,'r')
a=f.read()
f.close()
mall=findall('[^%]+',a)
del a
lm=len(mall)
m=[]
for i in range(args.numcores):
    if i<args.numcores-1:
        m[i]=mall[i*args.numcores:(i+1)*args.numcores]
    else:
        m[i]=mall[i*args.numcores:lm]
del mall

then pass it to a function fun(<list>)

In this case for each process: fun(m[i])

Mark Ransom · Accepted Answer

No, there's no copy made of the object. Parameters passed to a function reference the same object as the caller.

Deleting the variable within the function won't help, since there's still a reference at the calling site. Garbage collection won't occur until all references are gone.

Do the objects passed to function as arguments, get duplicated

Tags:

python

memory-management

multiprocessing

WYSIWYG

1 Answers

Mark Ransom

Recent Activity

Donate For Us

Do the objects passed to function as arguments, get duplicated

Tags:

python

memory-management

multiprocessing

WYSIWYG

1 Answers

Mark Ransom

Related questions

Recent Activity

Donate For Us