Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split lists within dataframe column into multiple columns [duplicate]

Tags:

python

pandas

I have a Pandas DataFrame column with multiple lists within a list. Something like this:

df
     col1
0    [[1,2], [2,3]]
1    [[a,b], [4,5], [x,y]] 
2    [[6,7]]

I want to split the list over multiple columns so the output should be something like:

    col1    col2     col3
0   [1,2]   [2,3]   
1   [a,b]   [4,5]    [x,y]
2   [6,7]

Please help me with this. Thanks in advance

like image 540
Ronnie Avatar asked May 22 '18 08:05

Ronnie


People also ask

How do I separate data in one column into multiple columns in Python?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.

How do you split a column in a list in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do I explode a column in pandas?

Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.


2 Answers

You can use pd.Series.apply:

df = pd.DataFrame({'col1': [[[1, 2], [2, 3]],
                            [['a', 'b'], [4, 5], ['x', 'y']],
                            [[6, 7]]]})

res = df['col1'].apply(pd.Series)

print(res)

        0       1       2
0  [1, 2]  [2, 3]     NaN
1  [a, b]  [4, 5]  [x, y]
2  [6, 7]     NaN     NaN
like image 56
jpp Avatar answered Nov 15 '22 00:11

jpp


I think need DataFrame contructor if performance is important:

df = pd.DataFrame(df['col1'].values.tolist())
print (df)
        0       1       2
0  [1, 2]  [2, 3]    None
1  [a, b]  [4, 5]  [x, y]
2  [6, 7]    None    None

If need remove NaNs - missing values first add dropna:

df = pd.DataFrame(df['col1'].dropna().values.tolist())
like image 20
jezrael Avatar answered Nov 14 '22 23:11

jezrael