I'm trying to write a function that retrieves a file under a specified name using the magic command %store.
For example, if I have stored a file as "df"
but later want to retrieve it under the name "frame" then I want to call the function using retrieve('df','frame')
after which the variable frame would contain the dataframe that was earlier stored as df.
However, I'm not sure how to do this, the below function just returns
"no stored variable outputfile"
import IPython
import gc
import os
import numpy as np
import pandas as pd
path = IPython.paths.get_ipython_dir()+'\profile_default\db\\autorestore\\'
function to retrieved a stored file (inputfile) under a specified name (outputfile)
def retrieve(inputfile,outputfile='temp'):
os.rename(r''+path+inputfile,r''+path+outputfile)
%store -r outputfile
os.rename(r''+path+outputfile,r''+path+inputfile)
return
In [48]: retrieve('df','frame')
returns "no stored variable outputfile"
The main reason for this is to release memory. I have some files I retrieve using %store
and then do some manipulations or merge into another dataframe
. After this I want to free the memory used, but running %xdel
on a file retrieved using %store -r
doesn't free the memory.
I've therefore written below function, which retrieves the stored file under the variable name temp. I can then afterwards free the memory by retrieving an empty file as temp.
#function to retrieved a stored file (inputfile) unde the variable name temp
def retrieve_temp(inputfile):
os.rename(r''+path+inputfile,r''+path+'temp')
%store -r temp
os.rename(r''+path+'temp',r''+path+inputfile)
return
so for example before retrieving anything current ram usage is
In [5]: ram_usage()
Out[5]: '107mb'
I then retrieve a file and look at new ram usage
In[6]: (retrieve_temp('comps'),ram_usage())[1]
Out[6]: '2520mb'
After running %xdel the usage stays the same
In[12]: %xdel temp
In[13]: ram_usage()
Out[13]: '2520mb'
After retrieving an empty file under the name "temp" the ram is freed
In [14]: (retrieve_temp('b'),ram_usage())[1]
Out [14]: '114mb'
This solves most of my memory problems, however, sometimes I need to work on more than one frame at the same time.
I therefore want to have a more generic function where I can specify the name used for the temporary frame and easily free the memory later. This would also help to make my code more readable by using more descriptive names for the temporary dataframes.
I would like to know if there's a way to get my first function to work (doesn't have to be by using the %store
magic, but I don't want to pickle the files myself)
Alternatively, please let me know if there's another way to free the memory that's used by a variable that's retrieved using the %store magic command.
(I've tried %xdel, del, %reset, gc.collect(),
launching sub-processes
which didn't work out too well, so far the only way it's worked is to reset the kernel or retrieve an empty file using the same name)
Many thanks,
After some more digging I found the function that calls the magic command and used that.
get_ipython().run_line_magic('store', '-r '+outputfile)
The modified function is below (note that if you use this you might want to make it more robust by for example adding some lines that temporarily renames any file you've already stored under the name "outputfile")
import IPython
import os
import gc
#function to retrieve a stored file (inputfile) under a specified name (outputfile)
def retrieve(inputfile,outputfile='temp'):
path = IPython.paths.get_ipython_dir()+'\profile_default\db\\autorestore\\'
os.rename(r''+path+inputfile,r''+path+outputfile)
get_ipython().run_line_magic('store', '-r '+outputfile)
os.rename(r''+path+outputfile,r''+path+inputfile)
gc.collect() #needed to free memory after returning an empty file
return
This appear to solve all my memory-leakage issues, as long as I don't run the notebook and print anything from the retrieved dataframe to a cell before I delete it again.
The short version is that after you're done with the variable referred to as df_temp
, you run retrieve('emptyfile','df_temp')
and as long as you haven't printed any result to a cell your memory should heopfully now be cleared
In [14]: ram_usage()
Out [14]: '101mb'
In [15]: retrieve('SFBkgs - Copy','df_temp')
In [16]: ram_usage()
Out [16]: '1281mb'
In [17]: df_temp.head(); #if I don't use ; to stop the printing of the output the below still fails to free the ram
In [18]: %xdel df_temp #this still doesn't free the ram
In [19]: ram_usage()
Out [19]: '1281mb'
In [20]: gc.collect()
Out [20]: 7
In [21]: ram_usage() #the garbage collector didn't help
Out [21]: '1281mb'
In [22]: retrieve('emptyfile','df_temp') #retrieves an empty file as df_temp
In [23]: ram_usage() #the memory has now been freed
Out [23]: '103mb'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With