I have the following series: <pre class="prettyprint"><code>s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']]) >>> s 0 [a, b] 1 [c, d] 2 [f, g] dtype: object </code></pre> what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get: <pre class="prettyprint"><code>l = ['a', 'b', 'c', 'd', 'f', 'g'] </code></pre> Thanks!

A nested list comprehension should be much faster. <pre class="prettyprint"><code>>>> [element for list_ in s for element in list_] ['a', 'b', 'c', 'd', 'f', 'g'] >>> %timeit -n 100000 [element for list_ in s for element in list_] 100000 loops, best of 3: 5.2 µs per loop >>> %timeit -n 100000 s.sum() 100000 loops, best of 3: 50.7 µs per loop </code></pre> Directly accessing the values of the list is even faster. <pre class="prettyprint"><code>>>> %timeit -n 100000 [element for list_ in s.values for element in list_] 100000 loops, best of 3: 2.77 µs per loop </code></pre>

I'm not timing or testing these options, but there's the new pandas method <code>explode</code>, and also <code>numpy.concatenate</code>.

efficient concatenation of lists in pandas series

Tags:

python

list

concatenation

pandas

I have the following series:

s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0    [a, b]
1    [c, d]
2    [f, g]
dtype: object

what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:

l = ['a', 'b', 'c', 'd', 'f', 'g']

Thanks!

509

asked Nov 05 '15 22:11

Alejandro Simkievich

2 Answers

A nested list comprehension should be much faster.

>>> [element for list_ in s for element in list_]
    ['a', 'b', 'c', 'd', 'f', 'g']

>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop

>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop

Directly accessing the values of the list is even faster.

>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop

113

answered Oct 15 '22 18:10

Alexander

I'm not timing or testing these options, but there's the new pandas method explode, and also numpy.concatenate.

answered Oct 15 '22 17:10

Alex Hall

Related questions
                            
                                Making an instagram posting bot with python?
                            
                                Combinations of MultiIndex levels which occur in a DataFrame
                            
                                Accessing serializer instances in nested serializer's field
                            
                                Getting the date of the last day of this [week/month/quarter/year]
                            
                                How to use psycopg2 connection string with variables?
                            
                                Assign value to a list using slice notation with assignee [duplicate]
                            
                                Round off floating point values in dict
                            
                                Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
                            
                                how Python cvxopt solvers qp basically works
                            
                                Is there a python construct that is a dummy function?
                            
                                Plot semi transparent contour plot over image file using matplotlib
                            
                                Comparing first element of the consecutive lists of tuples in Python
                            
                                pandas how to convert all the string value to float
                            
                                Removing first elements of tuples in a list
                            
                                retrieve intermediate features from a pipeline in Scikit (Python)
                            
                                VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1
                            
                                Django how to use the ``receiver`` decorator on a class instead on a function
                            
                                Seaborn PairGrid: show axes labels for each subplot
                            
                                Pyspark .toPandas() results in object column where expected numeric one
                            
                                How to create a very simple DNS server using Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With