Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: how do I split one row into multiple rows by multi-value column? [duplicate]

Tags:

python

pandas

I have a dataframe that appears as follows:

   issue_key date     pkey          component              case_count
0  1060  2018-03-08  PROJ  console,configuration,management    8   
1  1464  2018-04-24  PROJ2 protocol                            1   
2  611   2017-03-31  PROJ  None                                2
3  2057  2018-10-30  PROJ  ha, console                         0

I need to split the rows with multiple values in the component column into one row per component.

When done, the dataframe should appear as follows:

   issue_key date     pkey          component              case_count
0  1060  2018-03-08  PROJ  console                           8
1  1060  2018-03-08  PROJ  configuration                     8
2  1060  2018-03-08  PROJ  management                        8   
3  1464  2018-04-24  PROJ2 protocol                          1   
4  611   2017-03-31  PROJ  None                              2
5  2057  2018-10-30  PROJ  ha                                0
6  2057  2018-10-30  PROJ  console                           0

Any suggestions on how best to do this?

like image 523
Eric Avatar asked Dec 19 '18 23:12

Eric


People also ask

How do I split a row into multiple rows in pandas DataFrame?

To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method. to call apply with a lambda function that calls str. split to split the x string value. And then we call explode to fill new rows with the split values.

How do I split a single row into multiple columns in Python?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do I split a column into multiple rows in pandas?

To split text in a column into multiple rows with Python Pandas, we can use the str. split method. to create the df data frame. Then we call str.


1 Answers

Let's say dd is your data frame. You can do:

# convert to list
dd['component'] = dd['component'].str.split(',')

# convert list of pd.Series then stack it
dd = (dd
 .set_index(['issue_key','date','pkey','case_count'])['component']
 .apply(pd.Series)
 .stack()
 .reset_index()
 .drop('level_4', axis=1)
 .rename(columns={0:'component'}))

       issue_key        date   pkey  case_count      component
0       1060  2018-03-08   PROJ           8        console
1       1060  2018-03-08   PROJ           8  configuration
2       1060  2018-03-08   PROJ           8     management
3       1464  2018-04-24  PROJ2           1       protocol
4        611  2017-03-31   PROJ           2           None
5       2057  2018-10-30   PROJ           0             ha
6       2057  2018-10-30   PROJ           0        console
like image 171
YOLO Avatar answered Nov 11 '22 12:11

YOLO