I have a dataframe that appears as follows:
issue_key date pkey component case_count
0 1060 2018-03-08 PROJ console,configuration,management 8
1 1464 2018-04-24 PROJ2 protocol 1
2 611 2017-03-31 PROJ None 2
3 2057 2018-10-30 PROJ ha, console 0
I need to split the rows with multiple values in the component column into one row per component.
When done, the dataframe should appear as follows:
issue_key date pkey component case_count
0 1060 2018-03-08 PROJ console 8
1 1060 2018-03-08 PROJ configuration 8
2 1060 2018-03-08 PROJ management 8
3 1464 2018-04-24 PROJ2 protocol 1
4 611 2017-03-31 PROJ None 2
5 2057 2018-10-30 PROJ ha 0
6 2057 2018-10-30 PROJ console 0
Any suggestions on how best to do this?
To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method. to call apply with a lambda function that calls str. split to split the x string value. And then we call explode to fill new rows with the split values.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
To split text in a column into multiple rows with Python Pandas, we can use the str. split method. to create the df data frame. Then we call str.
Let's say dd
is your data frame. You can do:
# convert to list
dd['component'] = dd['component'].str.split(',')
# convert list of pd.Series then stack it
dd = (dd
.set_index(['issue_key','date','pkey','case_count'])['component']
.apply(pd.Series)
.stack()
.reset_index()
.drop('level_4', axis=1)
.rename(columns={0:'component'}))
issue_key date pkey case_count component
0 1060 2018-03-08 PROJ 8 console
1 1060 2018-03-08 PROJ 8 configuration
2 1060 2018-03-08 PROJ 8 management
3 1464 2018-04-24 PROJ2 1 protocol
4 611 2017-03-31 PROJ 2 None
5 2057 2018-10-30 PROJ 0 ha
6 2057 2018-10-30 PROJ 0 console
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With