Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Series of lists to one series

I have a Pandas Series of lists of strings:

0                           [slim, waist, man] 1                                [slim, waistline] 2                                     [santa] 

As you can see, the lists vary by length. I want an efficient way to collapse this into one series

0 slim 1 waist 2 man 3 slim 4 waistline 5 santa 

I know I can break up the lists using

series_name.split(' ') 

But I am having a hard time putting those strings back into one list.

Thanks!

like image 459
Max Avatar asked Jun 17 '15 07:06

Max


People also ask

What is Tolist () in pandas?

Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.

Can you merge a series to a DataFrame pandas?

Combine Two Series Using pandas.merge() method is used to combine complex column-wise combinations of DataFrame similar to SQL-like way. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case.

Can you slice a pandas series?

Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.


2 Answers

Here's a simple method using only pandas functions:

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']]) 

Then

s.apply(pd.Series).stack().reset_index(drop=True) 

gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

0  0         slim    1        waist    2          man 1  0         slim    1    waistline 2  0        santa 

If this is what you want, just omit .reset_index(drop=True) from the chain.

like image 134
mcwitt Avatar answered Sep 19 '22 04:09

mcwitt


In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

It helps to build the result you need.

For example you have such series:

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']]) 

Then you can use

s.explode() 

To get such result:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa 

In case of dataframe:

df = pd.DataFrame({   's': pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa']    ]),    'a': 1 }) 

You will have such DataFrame:

                    s  a 0  [slim, waist, man]  1 1   [slim, waistline]  1 2             [santa]  1 

Applying explode on s column:

df.explode('s') 

Will give you such result:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1 

If your series, contain empty lists

import pandas as pd  s = pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa'],     [] ]) 

Then running explode will introduce NaN values for empty lists, like this:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa 3          NaN 

If this is not desired, you can dropna method call:

s.explode().dropna() 

To get this result:

0         slim 0        waist 0          man 1         slim 1    waistline 2        santa 

Dataframes also have dropna method:

df = pd.DataFrame({   's': pd.Series([     ['slim', 'waist', 'man'],     ['slim', 'waistline'],     ['santa'],     []    ]),    'a': 1 }) 

Running explode without dropna:

df.explode('s') 

Will result into:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1 3        NaN  1 

with dropna:

df.explode('s').dropna(subset=['s']) 

Result:

           s  a 0       slim  1 0      waist  1 0        man  1 1       slim  1 1  waistline  1 2      santa  1 
like image 43
Roman Kotov Avatar answered Sep 19 '22 04:09

Roman Kotov