Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas memoization

I have lengthy computations which I repeat many times. Therefore, I would like to use memoization (packages such as jug and joblib), in concert with Pandas. The problem is whether the package would memoize well Pandas DataFrames as method arguments.

Has anyone tried it? Is there any other recommended package/way to do this?

like image 282
Yariv Avatar asked Mar 13 '13 13:03

Yariv


People also ask

What is Memoization in Python?

Definition of Memoization. Memoization is an efficient software optimization technique used to speed up programs. It allows you to optimize a python function by catching its output based on the supplied input parameters. Memoization ensures that a method runs for the same input only once.

Does pandas work in memory?

pandas provides data structures for in-memory analytics, which makes using pandas to analyze datasets that are larger than memory datasets somewhat tricky. Even datasets that are a sizable fraction of memory become unwieldy, as some pandas operations need to make intermediate copies.

Can Cython speed up pandas?

Cython (writing C extensions for pandas) For many use cases writing pandas in pure Python and NumPy is sufficient. In some computationally heavy applications however, it can be possible to achieve sizable speed-ups by offloading work to cython.


1 Answers

Author of jug here: jug works fine. I just tried the following and it works:

from jug import TaskGenerator
import pandas as pd
import numpy as np


@TaskGenerator
def gendata():
    return pd.DataFrame(np.arange(343440).reshape((10,-1)))

@TaskGenerator
def compute(x):
    return x.mean()

y = compute(gendata())

It is not as efficient as it could be as it just uses pickle internally for the DataFrame (although it compresses it on the fly, so it is not horrible in terms of memory use; just slower than it could be).

I would be open to a change which saves these as a special case as jug currently does for numpy arrays: https://github.com/luispedro/jug/blob/master/jug/backends/file_store.py#L102

like image 142
luispedro Avatar answered Oct 17 '22 04:10

luispedro