deque in python pandas

Tags:

I am using Python's deque() to implement a simple circular buffer:

from collections import deque
import numpy as np

test_sequence = np.array(range(100)*2).reshape(100,2)
mybuffer = deque(np.zeros(20).reshape((10, 2)))

for i in test_sequence:
    mybuffer.popleft()
    mybuffer.append(i)

    do_something_on(mybuffer)

I was wondering if there's a simple way of obtaining the same thing in Pandas using a Series (or DataFrame). In other words, how can I efficiently add a single row at the end and remove a single row at the beginning of a Series or DataFrame?

Edit: I tried this:

myPandasBuffer = pd.DataFrame(columns=('A','B'), data=np.zeros(20).reshape((10, 2)))
newpoint = pd.DataFrame(columns=('A','B'), data=np.array([[1,1]]))

for i in test_sequence:
    newpoint[['A','B']] = i
    myPandasBuffer = pd.concat([myPandasBuffer.ix[1:],newpoint], ignore_index = True)

    do_something_on(myPandasBuffer)

But it's painfully slower than the deque() method.

841

asked Nov 20 '13 08:11

Fra

1 Answers

As noted by dorvak, pandas is not designed for queue-like behaviour.

Below I've replicated the simple insert function from deque in pandas dataframes, numpy arrays, and also in hdf5 using the h5py module.

The timeit function reveals (unsurprisingly) that the collections module is much faster, followed by numpy and then pandas.

from collections import deque
import pandas as pd
import numpy as np
import h5py

def insert_deque(test_sequence, buffer_deque):
    for item in test_sequence:
        buffer_deque.popleft()
        buffer_deque.append(item)
    return buffer_deque
def insert_df(test_sequence, buffer_df):
    for item in test_sequence:
        buffer_df.iloc[0:-1,:] = buffer_df.iloc[1:,:].values
        buffer_df.iloc[-1] = item
    return buffer_df
def insert_arraylike(test_sequence, buffer_arr):
    for item in test_sequence:
        buffer_arr[:-1] = buffer_arr[1:]
        buffer_arr[-1] = item
    return buffer_arr

test_sequence = np.array(list(range(100))*2).reshape(100,2)

# create buffer arrays
nested_list = [[0]*2]*5
buffer_deque = deque(nested_list)
buffer_df = pd.DataFrame(nested_list, columns=('A','B'))
buffer_arr = np.array(nested_list)

# calculate speed of each process in ipython
print("deque : ")
%timeit insert_deque(test_sequence, buffer_deque)
print("pandas : ")
%timeit insert_df(test_sequence, buffer_df)
print("numpy array : ")
%timeit insert_arraylike(test_sequence, buffer_arr)
print("hdf5 with h5py : ")
with h5py.File("h5py_test.h5", "w") as f:
    f["buffer_hdf5"] = np.array(nested_list)
    %timeit insert_arraylike(test_sequence, f["buffer_hdf5"])

The %timeit results:

deque : 34.1 µs per loop

pandas : 48 ms per loop

numpy array : 187 µs per loop

hdf5 with h5py : 31.7 ms per loop

Notes:

My pandas slicing method was only slightly faster than the concat method listed in the question.

The hdf5 format (via h5py) did not show any advantages. I also don't see any advantages of HDFStore, as suggested by Andy.

answered Oct 13 '22 21:10

Mark Teese

Related questions
                            
                                Drawing regression line, confidence interval, and prediction interval in Python
                            
                                Faster way to accomplish this Pandas job than by using Apply for large data set?
                            
                                Django formset , queries for relational field for every form
                            
                                from where SSL ConnectionResetError comes from?
                            
                                Anaconda Navigator does not update packages
                            
                                Import python module in flutter using starflut
                            
                                How do I print to the OS's default printer in Python 3 (cross platform)?
                            
                                Dataflow computing in python
                            
                                from <module> import ... in __init__.py makes module name visible?
                            
                                Urllib and validation of server certificate
                            
                                Matplotlib: Label points on mouseover
                            
                                Rolling out a web authentication system
                            
                                Is threre a RoboCode like Game or Challenge for Python? [closed]
                            
                                Inconsistency between sed and python regular expressions
                            
                                fuzzy string matching with term weights
                            
                                django-mutant creating models in django-admin
                            
                                How to have drag-and-drop and sorted GtkTreeView in GTK3?
                            
                                Python threads and queue example
                            
                                Saving dictionaries to file (numpy and Python 2/3 friendly)
                            
                                Selenium WebDriver: Firefox starts, but does not open the URL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

deque in python pandas

Tags:

python

pandas

buffer

Fra

People also ask

1 Answers

Mark Teese

Recent Activity

Donate For Us