I have the following series:
s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0 [a, b]
1 [c, d]
2 [f, g]
dtype: object
what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:
l = ['a', 'b', 'c', 'd', 'f', 'g']
Thanks!
In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.
combine() is a series mathematical operation method. This is used to combine two series into one. The shape of output series is same as the caller series.
Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.
A nested list comprehension should be much faster.
>>> [element for list_ in s for element in list_]
['a', 'b', 'c', 'd', 'f', 'g']
>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop
>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop
Directly accessing the values of the list is even faster.
>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop
I'm not timing or testing these options, but there's the new pandas method explode
, and also numpy.concatenate
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With