Pandas dataframe: how do I split one row into multiple rows by multi-value column? [duplicate]

Tags:

python

pandas

I have a dataframe that appears as follows:

   issue_key date     pkey          component              case_count
0  1060  2018-03-08  PROJ  console,configuration,management    8   
1  1464  2018-04-24  PROJ2 protocol                            1   
2  611   2017-03-31  PROJ  None                                2
3  2057  2018-10-30  PROJ  ha, console                         0

I need to split the rows with multiple values in the component column into one row per component.

When done, the dataframe should appear as follows:

   issue_key date     pkey          component              case_count
0  1060  2018-03-08  PROJ  console                           8
1  1060  2018-03-08  PROJ  configuration                     8
2  1060  2018-03-08  PROJ  management                        8   
3  1464  2018-04-24  PROJ2 protocol                          1   
4  611   2017-03-31  PROJ  None                              2
5  2057  2018-10-30  PROJ  ha                                0
6  2057  2018-10-30  PROJ  console                           0

Any suggestions on how best to do this?

523

asked Dec 19 '18 23:12

Eric

1 Answers

Let's say dd is your data frame. You can do:

# convert to list
dd['component'] = dd['component'].str.split(',')

# convert list of pd.Series then stack it
dd = (dd
 .set_index(['issue_key','date','pkey','case_count'])['component']
 .apply(pd.Series)
 .stack()
 .reset_index()
 .drop('level_4', axis=1)
 .rename(columns={0:'component'}))

       issue_key        date   pkey  case_count      component
0       1060  2018-03-08   PROJ           8        console
1       1060  2018-03-08   PROJ           8  configuration
2       1060  2018-03-08   PROJ           8     management
3       1464  2018-04-24  PROJ2           1       protocol
4        611  2017-03-31   PROJ           2           None
5       2057  2018-10-30   PROJ           0             ha
6       2057  2018-10-30   PROJ           0        console

171

answered Nov 11 '22 12:11

YOLO

Related questions
                            
                                IB API Python sample not using Ibpy
                            
                                Combining cv2.imshow() with matplotlib plt.show() in real time
                            
                                Numpy diff inverted operation?
                            
                                How to make numpy array column sum up to 1
                            
                                why UniqueConstraint doesn't work in flask_sqlalchemy
                            
                                Why "numpy.any" has no short-circuit mechanism?
                            
                                Can Pandas perform row-wise min() and max() functions?
                            
                                How to copy a file from host to container using docker-py (docker SDK)
                            
                                Django test Client submitting a form with a POST request
                            
                                How to remove case-insensitive duplicates from a list, while maintaining the original list order?
                            
                                Django No module named 'django.db.migrations.migration'
                            
                                Dynamic task definition in Airflow
                            
                                pipenv and pyinstaller not packaging dependencies
                            
                                How to implement deprecation in python with argument alias
                            
                                Pandas Concat increases number of rows
                            
                                How can I count the number of consecutive TRUEs in a DataFrame?
                            
                                Matplotlib and :RuntimeError: main thread is not in main loop:
                            
                                How to print binary numbers using f"" string instead of .format()?
                            
                                Jupyter Notebook - ModuleNotFoundError [duplicate]
                            
                                python aiohttp into existing event loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With