Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient concatenation of lists in pandas series

I have the following series:

s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0    [a, b]
1    [c, d]
2    [f, g]
dtype: object

what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:

l = ['a', 'b', 'c', 'd', 'f', 'g']

Thanks!

like image 509
Alejandro Simkievich Avatar asked Nov 05 '15 22:11

Alejandro Simkievich


People also ask

Is pandas concat fast?

In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.

Which function is used to concatenate two series in pandas?

combine() is a series mathematical operation method. This is used to combine two series into one. The shape of output series is same as the caller series.

What is the difference between merging and concatenation in pandas?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.


2 Answers

A nested list comprehension should be much faster.

>>> [element for list_ in s for element in list_]
    ['a', 'b', 'c', 'd', 'f', 'g']

>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop

>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop

Directly accessing the values of the list is even faster.

>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop
like image 113
Alexander Avatar answered Oct 15 '22 18:10

Alexander


I'm not timing or testing these options, but there's the new pandas method explode, and also numpy.concatenate.

like image 40
Alex Hall Avatar answered Oct 15 '22 17:10

Alex Hall