My data-structure is:
ds = [{
"name": "groupA",
"subGroups": [123,456]
},
{
"name": "groupB",
"subGroups": ['aaa', 'bbb' , 'ccc']
}]
This gives the following dataframe
df = pd.DataFrame(ds)
name subGroups
0 groupA [123, 456]
1 groupB [aaa, bbb, ccc]
I want:
name subGroupsFlattend
0 groupA 123
1 groupA 456
2 groupB aaa
3 groupB bbb
4 groupB ccc
Any ideas?
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
Flatten List of Lists Using itertools (chain()) This approach is ideal for transforming a 2-D list into a single flat list as it treats consecutive sequences as a single sequence by iterating through the iterable passed as the argument in a sequential manner.
Use as flatten_col(input, 'B', 'B') in your example. The benefit of this method is that copies along all other columns as well (unlike some other solutions).
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
Use explode
:
df = df.explode('subGroups')
You can fix your output by following :
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),'subGroup':df.subGroups.sum()})
Out[364]:
name subGroup
0 groupA 123
0 groupA 456
1 groupB aaa
1 groupB bbb
1 groupB ccc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With