Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas persistent cache

Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?

In particular is there a caching method for get_yahoo_data for financial?

A very plus would be:

  • very few lines of code to write
  • possibility to integrate the persisted series when new data is downloaded for the same source
like image 910
Luca C. Avatar asked Jul 08 '18 19:07

Luca C.


2 Answers

There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickle to store the DataFrame to disk and pandas.read_pickle to read the stored DataFrame from disk.

An example for a pandas.DataFrame:

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

The same methods also work for pandas.Series:

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory
like image 67
nijm Avatar answered Oct 11 '22 12:10

nijm


You could use the Data cache package.

from data_cache import pandas_cache

@pandas_cache
def foo():
    ...
like image 21
Eirik Lid Avatar answered Oct 11 '22 13:10

Eirik Lid