I have a Pandas Series of lists of strings:
0 [slim, waist, man] 1 [slim, waistline] 2 [santa]
As you can see, the lists vary by length. I want an efficient way to collapse this into one series
0 slim 1 waist 2 man 3 slim 4 waistline 5 santa
I know I can break up the lists using
series_name.split(' ')
But I am having a hard time putting those strings back into one list.
Thanks!
Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.
Combine Two Series Using pandas.merge() method is used to combine complex column-wise combinations of DataFrame similar to SQL-like way. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case.
Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.
Here's a simple method using only pandas functions:
import pandas as pd s = pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
Then
s.apply(pd.Series).stack().reset_index(drop=True)
gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.
0 0 slim 1 waist 2 man 1 0 slim 1 waistline 2 0 santa
If this is what you want, just omit .reset_index(drop=True)
from the chain.
In pandas version 0.25.0
appeared a new method 'explode' for series and dataframes. Older versions do not have such method.
It helps to build the result you need.
For example you have such series:
import pandas as pd s = pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
Then you can use
s.explode()
To get such result:
0 slim 0 waist 0 man 1 slim 1 waistline 2 santa
In case of dataframe:
df = pd.DataFrame({ 's': pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'] ]), 'a': 1 })
You will have such DataFrame:
s a 0 [slim, waist, man] 1 1 [slim, waistline] 1 2 [santa] 1
Applying explode on s
column:
df.explode('s')
Will give you such result:
s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1
If your series, contain empty lists
import pandas as pd s = pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'], [] ])
Then running explode
will introduce NaN values for empty lists, like this:
0 slim 0 waist 0 man 1 slim 1 waistline 2 santa 3 NaN
If this is not desired, you can dropna method call:
s.explode().dropna()
To get this result:
0 slim 0 waist 0 man 1 slim 1 waistline 2 santa
Dataframes also have dropna method:
df = pd.DataFrame({ 's': pd.Series([ ['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa'], [] ]), 'a': 1 })
Running explode
without dropna:
df.explode('s')
Will result into:
s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1 3 NaN 1
with dropna:
df.explode('s').dropna(subset=['s'])
Result:
s a 0 slim 1 0 waist 1 0 man 1 1 slim 1 1 waistline 1 2 santa 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With