Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ipython use %store magic to retrieve dynamic name

I'm trying to write a function that retrieves a file under a specified name using the magic command %store. For example, if I have stored a file as "df" but later want to retrieve it under the name "frame" then I want to call the function using retrieve('df','frame') after which the variable frame would contain the dataframe that was earlier stored as df.

However, I'm not sure how to do this, the below function just returns

"no stored variable outputfile"

import IPython
import gc
import os
import numpy as np
import pandas as pd

path = IPython.paths.get_ipython_dir()+'\profile_default\db\\autorestore\\'

function to retrieved a stored file (inputfile) under a specified name (outputfile)

def retrieve(inputfile,outputfile='temp'):
    os.rename(r''+path+inputfile,r''+path+outputfile)
    %store -r outputfile
    os.rename(r''+path+outputfile,r''+path+inputfile)
    return


In [48]: retrieve('df','frame')
returns "no stored variable outputfile"

More details for my reason of doing this/background

The main reason for this is to release memory. I have some files I retrieve using %store and then do some manipulations or merge into another dataframe. After this I want to free the memory used, but running %xdel on a file retrieved using %store -r doesn't free the memory.

I've therefore written below function, which retrieves the stored file under the variable name temp. I can then afterwards free the memory by retrieving an empty file as temp.

#function to retrieved a stored file (inputfile) unde the variable name temp
def retrieve_temp(inputfile):
    os.rename(r''+path+inputfile,r''+path+'temp')
    %store -r temp
    os.rename(r''+path+'temp',r''+path+inputfile)
    return

so for example before retrieving anything current ram usage is

In [5]: ram_usage()
Out[5]: '107mb'

I then retrieve a file and look at new ram usage

In[6]: (retrieve_temp('comps'),ram_usage())[1]
Out[6]: '2520mb'

After running %xdel the usage stays the same

In[12]: %xdel temp
In[13]: ram_usage()
Out[13]: '2520mb'

After retrieving an empty file under the name "temp" the ram is freed

In [14]: (retrieve_temp('b'),ram_usage())[1]
Out [14]: '114mb'

This solves most of my memory problems, however, sometimes I need to work on more than one frame at the same time.

I therefore want to have a more generic function where I can specify the name used for the temporary frame and easily free the memory later. This would also help to make my code more readable by using more descriptive names for the temporary dataframes.

I would like to know if there's a way to get my first function to work (doesn't have to be by using the %store magic, but I don't want to pickle the files myself)

Alternatively, please let me know if there's another way to free the memory that's used by a variable that's retrieved using the %store magic command. (I've tried %xdel, del, %reset, gc.collect(), launching sub-processes which didn't work out too well, so far the only way it's worked is to reset the kernel or retrieve an empty file using the same name)

Many thanks,

like image 584
Pureluck Avatar asked Nov 08 '22 06:11

Pureluck


1 Answers

After some more digging I found the function that calls the magic command and used that. get_ipython().run_line_magic('store', '-r '+outputfile)

The modified function is below (note that if you use this you might want to make it more robust by for example adding some lines that temporarily renames any file you've already stored under the name "outputfile")

import IPython
import os
import gc
#function to retrieve a stored file (inputfile) under a specified name (outputfile)
def retrieve(inputfile,outputfile='temp'):
    path = IPython.paths.get_ipython_dir()+'\profile_default\db\\autorestore\\'
    os.rename(r''+path+inputfile,r''+path+outputfile)
    get_ipython().run_line_magic('store', '-r '+outputfile)
    os.rename(r''+path+outputfile,r''+path+inputfile)
    gc.collect() #needed to free memory after returning an empty file
    return

This appear to solve all my memory-leakage issues, as long as I don't run the notebook and print anything from the retrieved dataframe to a cell before I delete it again.

New Ram usage:

The short version is that after you're done with the variable referred to as df_temp , you run retrieve('emptyfile','df_temp') and as long as you haven't printed any result to a cell your memory should heopfully now be cleared

In [14]: ram_usage()
Out [14]: '101mb'
In [15]: retrieve('SFBkgs - Copy','df_temp')
In [16]: ram_usage()
Out [16]: '1281mb'
In [17]: df_temp.head(); #if I don't use ; to stop the printing of the output the below still fails to free the ram
In [18]: %xdel df_temp #this still doesn't free the ram
In [19]: ram_usage()
Out [19]: '1281mb'
In [20]: gc.collect()
Out [20]: 7
In [21]: ram_usage() #the garbage collector didn't help
Out [21]: '1281mb'
In [22]: retrieve('emptyfile','df_temp') #retrieves an empty file as df_temp
In [23]: ram_usage() #the memory has now been freed
Out [23]: '103mb'
like image 79
Pureluck Avatar answered Nov 14 '22 23:11

Pureluck