Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

genfromtxt: how to disable caching

Tags:

python

numpy

I have confirmed that the genfromtxt function (and those derived from it) silently cache the remote file they are processing in a local directory and use the local copy in subsequent invocations without checking if it has changed.
By looking in to the source file npyio.py it seems this happens because the DataSource object that handles the request is created without passing the relevant parameter. It is of course easy to modify the library sources to disable caching but then I would have to repeat that after every upgrade.
Is there any other solution? (apart from deleting the cache directory every time)

like image 709
NameOfTheRose Avatar asked Sep 28 '22 00:09

NameOfTheRose


1 Answers

I think this question is actually composed of two parts:

  1. What to do if a library's functionality does not exactly match the required behavior?

  2. How to handle genfromtxt's caching behavior in particular?

Regarding 1., wrapping (possibly with injection) is a more resilient way than patching the library (except if the patch is done upstream, in the library's repo).

So wrapping genfromtxt could be done like:

def patched_gen_from_text(*args, **kwargs):
    # Do something regarding caching
    return numpy.genfromtxt(*args, **kwargs)

you could even inject this as numpy.genfromtext without modifying the sources (not that I would recommend this):

import numpy 

numpy.genfromtxt = patched_gen_from_text

Regarding 2. it really depends on the access you have to the remote filesystem (e.g., can you run there a proces? can you mount it?), and the tradeoff between speed and certainty required.

E.g., at one extreme, your patched version could unconditionally erase the local file (certain but slow). Alternatively, you might be able to request the remote file's update time and length, and see how they correspond to the local file. At the other extreme, you might be able to run an md5 check on RPC at the other computer.

You might want to check filecmp for different comparison options, as well as a possible actual building block for some of the cases.

like image 146
Ami Tavory Avatar answered Oct 29 '22 01:10

Ami Tavory