I have a Pandas Series of lists of strings: <pre class="prettyprint"><code>0 [slim, waist, man] 1 [slim, waistline] 2 [santa] </code></pre> As you can see, the lists vary by length. I want an efficient way to collapse this into one series <pre class="prettyprint"><code>0 slim 1 waist 2 man 3 slim 4 waistline 5 santa </code></pre> I know I can break up the lists using <pre class="prettyprint"><code>series_name.split(' ') </code></pre> But I am having a hard time putting those strings back into one list. Thanks!

In pandas version <code>0.25.0</code> appeared a new method 'explode' for series and dataframes. Older versions do not have such method. It helps to build the result you need. For example you have such series: <pre class="prettyprint"><code>import pandas as pd s = pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']]) </code></pre> Then you can use <pre class="prettyprint"><code>s.explode() </code></pre> To get such result: <pre class="prettyprint"><code>0 slim 0 waist 0 man 1 slim 1 waistline 2 santa </code></pre> In case of dataframe: <pre class="prettyprint"><code>df = pd.DataFrame({ 's': pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'] ]), 'a': 1 }) </code></pre> You will have such DataFrame: <pre class="prettyprint"><code> s a 0 [slim, waist, man] 1 1 [slim, waistline] 1 2 [santa] 1 </code></pre> Applying explode on <code>s</code> column: <pre class="prettyprint"><code>df.explode('s') </code></pre> Will give you such result: <pre class="prettyprint"><code> s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1 </code></pre> If your series, contain empty lists <pre class="prettyprint"><code>import pandas as pd s = pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'], [] ]) </code></pre> Then running <code>explode</code> will introduce NaN values for empty lists, like this: <pre class="prettyprint"><code>0 slim 0 waist 0 man 1 slim 1 waistline 2 santa 3 NaN </code></pre> If this is not desired, you can dropna method call: <pre class="prettyprint"><code>s.explode().dropna() </code></pre> To get this result: <pre class="prettyprint"><code>0 slim 0 waist 0 man 1 slim 1 waistline 2 santa </code></pre> Dataframes also have dropna method: <pre class="prettyprint"><code>df = pd.DataFrame({ 's': pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'], [] ]), 'a': 1 }) </code></pre> Running <code>explode</code> without dropna: <pre class="prettyprint"><code>df.explode('s') </code></pre> Will result into: <pre class="prettyprint"><code> s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1 3 NaN 1 </code></pre> with dropna: <pre class="prettyprint"><code>df.explode('s').dropna(subset=['s']) </code></pre> Result: <pre class="prettyprint"><code> s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1 </code></pre>

Pandas Series of lists to one series

Tags:

python

string

list

pandas

series

I have a Pandas Series of lists of strings:

0                           [slim, waist, man] 1                                [slim, waistline] 2                                     [santa]

As you can see, the lists vary by length. I want an efficient way to collapse this into one series

0 slim 1 waist 2 man 3 slim 4 waistline 5 santa

I know I can break up the lists using

series_name.split(' ')

But I am having a hard time putting those strings back into one list.

Thanks!

459

asked Jun 17 '15 07:06

Max

2 Answers

Here's a simple method using only pandas functions:

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']])

Then

s.apply(pd.Series).stack().reset_index(drop=True)

gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

0  0         slim    1        waist    2          man 1  0         slim    1    waistline 2  0        santa

If this is what you want, just omit .reset_index(drop=True) from the chain.

134

answered Sep 19 '22 04:09

mcwitt

In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

It helps to build the result you need.

For example you have such series:

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']])

Then you can use

s.explode()

To get such result:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa

In case of dataframe:

df = pd.DataFrame({   's': pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']    ]),    'a': 1 })

You will have such DataFrame:

                    s  a 0  [slim, waist, man]  1 1   [slim, waistline]  1 2             [santa]  1

Applying explode on s column:

df.explode('s')

Will give you such result:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1

If your series, contain empty lists

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa'],     [] ])

Then running explode will introduce NaN values for empty lists, like this:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa 3          NaN

If this is not desired, you can dropna method call:

s.explode().dropna()

To get this result:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa

Dataframes also have dropna method:

df = pd.DataFrame({   's': pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa'],     []    ]),    'a': 1 })

Running explode without dropna:

df.explode('s')

Will result into:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1 3        NaN  1

with dropna:

df.explode('s').dropna(subset=['s'])

Result:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1

answered Sep 19 '22 04:09

Roman Kotov

Related questions
                            
                                Map each list value to its corresponding percentile
                            
                                python enums with attributes
                            
                                python OpenCV - add alpha channel to RGB image
                            
                                How to get unique values with respective occurrence count from a list in Python?
                            
                                Changing plot scale by a factor in matplotlib
                            
                                Do you use Python mostly for its functional or object-oriented features? [closed]
                            
                                initialize dict with keys,values from two list [duplicate]
                            
                                How To Get Latitude & Longitude with python
                            
                                Why is subtraction faster than addition in Python?
                            
                                How to run " ps cax | grep something " in Python?
                            
                                Get an object attribute [duplicate]
                            
                                Pandas sum two columns, skipping NaN
                            
                                python flask redirect to https from http
                            
                                How to get file extension correctly?
                            
                                Python list slice syntax used for no obvious reason
                            
                                Iterating through a multidimensional array in Python
                            
                                Eclipse Organize Imports Shortcut (Ctrl+Shift+O) is not working
                            
                                Fastest way to zero out low values in array?
                            
                                Django: how to get format date in views?
                            
                                Django : Table doesn't exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With