How to speed up resample procedure in Pandas?

Tags:

Say you have a dataframe of 1 minute time series with index, 4 columns and 4 million rows. When you try to do something like:

 conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
 df1 = df.resample('5Min', how=conversion)

It takes an absurd amount of time (20-30 minutes). How can I speed up this process?

Pandas 18, Python 2.7

319

asked May 01 '16 19:05

hernanavella

Video Answer

1 Answers

Resample seems to work quite fast on a dataset of size (4000000, 4):

idx = pd.date_range('1/1/2010', periods=4000000, freq='T')
df = pd.DataFrame(np.random.rand(4000000, 4), columns = ["Open", "High", "Low", "Close"], index = idx)
%timeit df.resample("5Min").agg(conversion)
1 loop, best of 3: 253 ms per loop

With an irregular index and some nan's:

idx1 = pd.date_range('1/1/1900', periods=10000000, freq='Min')
idx2 = pd.date_range('1/1/1940', periods=10000000, freq='Min')
idx3 = pd.date_range('1/1/1980', periods=10000000, freq='Min')
idx4 = pd.date_range('1/1/2020', periods=10000000, freq='Min')
idx = np.array([np.random.choice(idx1, 1000000), np.random.choice(idx2, 1000000), np.random.choice(idx3, 1000000), 
                np.random.choice(idx4, 1000000)]).flatten()
np.random.shuffle(idx)
df = pd.DataFrame(np.random.randint(100, size=(4000000, 4)), columns = ["Open", "High", "Low", "Close"], index = idx)
df.loc[np.random.choice(idx, 100000), "Open"] = np.nan
df.loc[np.random.choice(idx, 50000), "High"] = np.nan
df.loc[np.random.choice(idx, 500000), "Low"] = np.nan
df.loc[np.random.choice(idx, 20000), "Close"] = np.nan
%timeit df.resample("5Min").agg(conversion)
1 loop, best of 3: 5.06 s per loop

So it seems like something other than resample is taking time for your case.

139

answered Oct 27 '22 05:10

ayhan

Related questions
                            
                                How to run commands on same TCL shell using Python
                            
                                Why single python process's cpu usage can be more than 100 percent?
                            
                                What's the efficient inverse-operation of numpy.array_split?
                            
                                Best editor for remote python files
                            
                                Simulating a neuron spike train in python
                            
                                How connect my GoPro Hero 4 camera live stream to openCV using Python?
                            
                                Genetic Algorithm: Higher Mutation Rate leads to lower run time
                            
                                Recursive import: 'import' vs. 'from ... import ...'
                            
                                Scraping Google Analytics by Scrapy
                            
                                Convert custom formula to python function [duplicate]
                            
                                One-sided truncated normal distribution in scipy
                            
                                Apply control characters to a string - Python
                            
                                How to not store password in .pypirc?
                            
                                How to serialize a pyspark Pipeline object?
                            
                                Scrapy suppress handled errors
                            
                                Detecting whether a Flask app handles a URL
                            
                                expected string or buffer ,date_re.match(value) django error
                            
                                Python, Pandas: tz_localize AmbiguousTimeError: Cannot infer dst time with non DST dates
                            
                                ggplot python handling time data over many weeks at hourly resolution
                            
                                Force celery to use json in place of pickle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to speed up resample procedure in Pandas?

Tags:

python

pandas

dataframe

python-2.7

resampling

hernanavella

People also ask

Video Answer

1 Answers

ayhan

Recent Activity

Donate For Us