Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cache file handle to netCDF files in python

Tags:

python

netcdf

Is there a way to cache python file handles? I have a function which takes a netCDF file path as input, opens it, extracts some data from the netCDF file and closes it. It gets called a lot of times, and the overhead of opening the file each time is high.

How can I make it faster by maybe caching the file handle? Perhaps there is a python library to do this

like image 769
user308827 Avatar asked Sep 29 '16 19:09

user308827


1 Answers

Yes, you can use following python libraries:

  • dill (required)
  • python-memcached (optional)

Let's follow the example. You have two files:

# save.py - it puts deserialized file handler object to memcached
import dill
import memcache            


mc = memcache.Client(['127.0.0.1:11211'], debug=0)
file_handler = open('data.txt', 'r')
mc.set("file_handler", dill.dumps(file_handler))
print 'saved!'   

and

# read_from_file.py - it gets deserialized file handler object from memcached, 
#                     then serializes it and read lines from it
import dill
import memcache


mc = memcache.Client(['127.0.0.1:11211'], debug=0)
file_handler = dill.loads(mc.get("file_handler"))
print file_handler.readlines() 

Now if you run:

python save.py
python read_from_file.py

you can get what you want.

Why it works?

Because you didn't close the file (file_handler.close()), so object still exist in memory (has not been garbage collected, because of weakref) and you can use it. Even in different process.

Solution

import dill
import memcache


mc = memcache.Client(['127.0.0.1:11211'], debug=0)
serialized = mc.get("file_handler")
if serialized:
    file_handler = dill.loads(serialized)
else:
    file_handler = open('data.txt', 'r')
    mc.set("file_handler", dill.dumps(file_handler))
print file_handler.readlines() 
like image 177
turkus Avatar answered Sep 19 '22 21:09

turkus