Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?
In particular is there a caching method for get_yahoo_data
for financial?
A very plus would be:
There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickle
to store the DataFrame to disk and pandas.read_pickle
to read the stored DataFrame from disk.
An example for a pandas.DataFrame
:
# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory
# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory
The same methods also work for pandas.Series
:
# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory
# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory
You could use the Data cache package.
from data_cache import pandas_cache
@pandas_cache
def foo():
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With