Python pandas persistent cache

Question

Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?

In particular is there a caching method for get_yahoo_data for financial?

A very plus would be:

very few lines of code to write
possibility to integrate the persisted series when new data is downloaded for the same source

nijm · Accepted Answer

There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickle to store the DataFrame to disk and pandas.read_pickle to read the stored DataFrame from disk.

An example for a pandas.DataFrame:

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

The same methods also work for pandas.Series:

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory

Eirik Lid · Answer

You could use the Data cache package.

from data_cache import pandas_cache

@pandas_cache
def foo():
    ...

Python pandas persistent cache

Tags:

pandas

caching

persistence

financial

Luca C.

2 Answers

nijm

Eirik Lid

Recent Activity

Donate For Us

Python pandas persistent cache

Tags:

pandas

caching

persistence

financial

Luca C.

2 Answers

nijm

Eirik Lid

Related questions

Recent Activity

Donate For Us