Pandas memoization

Tags:

I have lengthy computations which I repeat many times. Therefore, I would like to use memoization (packages such as jug and joblib), in concert with Pandas. The problem is whether the package would memoize well Pandas DataFrames as method arguments.

Has anyone tried it? Is there any other recommended package/way to do this?

282

asked Mar 13 '13 13:03

Yariv

1 Answers

Author of jug here: jug works fine. I just tried the following and it works:

Click to copy

from jug import TaskGenerator
import pandas as pd
import numpy as np


@TaskGenerator
def gendata():
    return pd.DataFrame(np.arange(343440).reshape((10,-1)))

@TaskGenerator
def compute(x):
    return x.mean()

y = compute(gendata())

It is not as efficient as it could be as it just uses pickle internally for the DataFrame (although it compresses it on the fly, so it is not horrible in terms of memory use; just slower than it could be).

I would be open to a change which saves these as a special case as jug currently does for numpy arrays: https://github.com/luispedro/jug/blob/master/jug/backends/file_store.py#L102

142

answered Oct 17 '22 04:10

luispedro

Related questions
                            
                                How can I get a process's stdin by a process id? [closed]
                            
                                Mayavi doesn't run from within Spyder: complains about "ValueError: API 'QString' ..."
                            
                                Group an iterable by a predicate in Python
                            
                                How do I scale the x and y axes in mayavi2?
                            
                                (semi-) automatic generation of argparsers for functions
                            
                                Explanation of need for Multi Threading GUI programming
                            
                                bytes to human readable, and back. without data loss
                            
                                Python - POSTing with a urllib2 opener
                            
                                Sum a csv column in python
                            
                                Saving multiple plots
                            
                                Python subprocess check_output much slower then call
                            
                                Can the usage of `setattr` (and `getattr`) be considered as bad practice?
                            
                                Django custom model fields: to_python() not called
                            
                                Parsing dictionary-like URL parameters in Python
                            
                                When to use a Singleton in python?
                            
                                Python module for searching patent databases, ie USPTO or EPO
                            
                                Getting the length of a ogg track from s3 without downloading the whole file
                            
                                Unable to serve static files like css, js in django python
                            
                                Python File Structure on GitHub
                            
                                Overriding the default type() metaclass before Python runs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas memoization

Tags:

python

pandas

package

memoization

Yariv

People also ask

1 Answers

luispedro

Recent Activity

Donate For Us