Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deque in python pandas

I am using Python's deque() to implement a simple circular buffer:

from collections import deque
import numpy as np

test_sequence = np.array(range(100)*2).reshape(100,2)
mybuffer = deque(np.zeros(20).reshape((10, 2)))

for i in test_sequence:
    mybuffer.popleft()
    mybuffer.append(i)

    do_something_on(mybuffer)

I was wondering if there's a simple way of obtaining the same thing in Pandas using a Series (or DataFrame). In other words, how can I efficiently add a single row at the end and remove a single row at the beginning of a Series or DataFrame?

Edit: I tried this:

myPandasBuffer = pd.DataFrame(columns=('A','B'), data=np.zeros(20).reshape((10, 2)))
newpoint = pd.DataFrame(columns=('A','B'), data=np.array([[1,1]]))

for i in test_sequence:
    newpoint[['A','B']] = i
    myPandasBuffer = pd.concat([myPandasBuffer.ix[1:],newpoint], ignore_index = True)

    do_something_on(myPandasBuffer)

But it's painfully slower than the deque() method.

like image 841
Fra Avatar asked Nov 20 '13 08:11

Fra


People also ask

Is deque better than list?

For lists, it's always O(1). So, for accessing elements, lists are always a better choice, it's not at all what deques were designed for. Second, because deques are implemented as doubly-ended arrays, they have the advantage when appending or popping from both the right and the left side of a deque (measured as O(1)).

Is deque built in python?

Python's deque was the first data type added to the collections module back in Python 2.4.

How do you access a deque in python?

The deque data structure from the collections module does not have a peek method, but similar results can be achieved by fetching the elements with square brackets. The first element can be accessed using [0] and the last element can be accessed using [-1].

What is queue and deque in python?

Python queue is a built in library that allows you to create a list that uses the FIFO rule, first in first out. Python deque uses the opposite rule, LIFO queue, or last in first out. Both operate on stacks and queues. When you're working in Python, you may want to create a queue of items instead of a list.


1 Answers

As noted by dorvak, pandas is not designed for queue-like behaviour.

Below I've replicated the simple insert function from deque in pandas dataframes, numpy arrays, and also in hdf5 using the h5py module.

The timeit function reveals (unsurprisingly) that the collections module is much faster, followed by numpy and then pandas.

from collections import deque
import pandas as pd
import numpy as np
import h5py

def insert_deque(test_sequence, buffer_deque):
    for item in test_sequence:
        buffer_deque.popleft()
        buffer_deque.append(item)
    return buffer_deque
def insert_df(test_sequence, buffer_df):
    for item in test_sequence:
        buffer_df.iloc[0:-1,:] = buffer_df.iloc[1:,:].values
        buffer_df.iloc[-1] = item
    return buffer_df
def insert_arraylike(test_sequence, buffer_arr):
    for item in test_sequence:
        buffer_arr[:-1] = buffer_arr[1:]
        buffer_arr[-1] = item
    return buffer_arr

test_sequence = np.array(list(range(100))*2).reshape(100,2)

# create buffer arrays
nested_list = [[0]*2]*5
buffer_deque = deque(nested_list)
buffer_df = pd.DataFrame(nested_list, columns=('A','B'))
buffer_arr = np.array(nested_list)

# calculate speed of each process in ipython
print("deque : ")
%timeit insert_deque(test_sequence, buffer_deque)
print("pandas : ")
%timeit insert_df(test_sequence, buffer_df)
print("numpy array : ")
%timeit insert_arraylike(test_sequence, buffer_arr)
print("hdf5 with h5py : ")
with h5py.File("h5py_test.h5", "w") as f:
    f["buffer_hdf5"] = np.array(nested_list)
    %timeit insert_arraylike(test_sequence, f["buffer_hdf5"])

The %timeit results:

deque : 34.1 µs per loop

pandas : 48 ms per loop

numpy array : 187 µs per loop

hdf5 with h5py : 31.7 ms per loop

Notes:

My pandas slicing method was only slightly faster than the concat method listed in the question.

The hdf5 format (via h5py) did not show any advantages. I also don't see any advantages of HDFStore, as suggested by Andy.

like image 66
Mark Teese Avatar answered Oct 13 '22 21:10

Mark Teese