I would like to take a Pandas Dataframe named df
which has an ID column and a lists column of lists that have variable number of tuples, all the tuples have the same length. Looks like this:
ID list
1 [(0,1,2,3),(1,2,3,4),(2,3,4,NaN)]
2 [(Nan,1,2,3),(9,2,3,4)]
3 [(Nan,1,2,3),(9,2,3,4),(A,b,9,c),($,*,k,0)]
And I would like to unpack each list into columns 'A','B','C','D' representing the fixed positions in each tuple.
The result should look like:
ID A B C D
1 0 1 2 3
1 1 2 3 4
1 2 3 4 NaN
2 NaN 1 2 3
2 9 2 3 4
3 NaN 1 2 3
3 9 2 3 4
3 A b 9 c
3 $ * k 0
I have tried df.apply(pd.Series(list)
but fails as the len
of the list elements is different on different rows. Somehow need to unpack to columns and transpose by ID?
To split a column of tuples in a Python Pandas data frame, we can use the column's tolist method. We create the df data frame with the pd. DataFrame class and a dictionary. Then we create a new data frame from df by using df['b'].
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
In [38]: (df.groupby('ID')['list']
.apply(lambda x: pd.DataFrame(x.iloc[0], columns=['A', 'B', 'C', 'D']))
.reset_index())
Out[38]:
ID level_1 A B C D
0 1 0 0 1 2 3
1 1 1 1 2 3 4
2 1 2 2 3 4 NaN
3 2 0 NaN 1 2 3
4 2 1 9 2 3 4
5 3 0 NaN 1 2 3
6 3 1 9 2 3 4
7 3 2 A b 9 c
8 3 3 $ * k 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With