Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to split each string into new row with some string concatenation

This is my df which consists of 3 columns. I roughly know how to split strings into a new line using stack and unstack. However, I'm wondering how I can retain the "prefix" (which might not always be the same length) when splitting the string.

Edit: Currently I am working with the Pandas version 0.23.0 without the explode function.

Before:

Col1   Col2              Col3
1       QQ12345-01/02/03  x
2       QQ123456-01/02    y
3       QQ12345-01/02/03  z

After:

Col1   Col2              Col3
1      QQ12345-01        x
1      QQ12345-02        x
1      QQ12345-03        x
2      QQ123456-01       y
2      QQ123456-02       y
3      QQ12345-01        z
3      QQ12345-02        z
3      QQ12345-03        z

Currently, I can only manage to split by '/' this is my code below. I Appreciate any help on this.

column_list = df.loc[:,df.columns!='Col2'].columns.tolist()
df.set_index(column_list).stack().str.split('\',expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
like image 710
Gabriel Choo Avatar asked Apr 01 '26 22:04

Gabriel Choo


1 Answers

Edit: Currently I am working with the Pandas version 0.23.0 without the explode function.

Okay, let's try some string split/join and let's use melt, it was introduced in pandas version 0.20, so this solution should work for you.

result = (
            df[['Col1', 'Col3']].join(
                df['Col2'].str.split('-')
                    .apply(lambda x: ','.join(f'{x[0]}-{item}' for item in x[1].split('/')))
                    .str.split(',', expand=True))
                .melt(id_vars=['Col1', 'Col3'], value_name='value')
                .dropna()
                .rename(columns={'value': 'Col2'})
                .sort_values(by='Col3')
    )[['Col1','Col2', 'Col3']]

EXPLANATION:

Instead of splitting the string on /, split it on -, then join the first part to the second part (splitted by /), join all these items by , and finally call split on , with expand as True, It will add n columns for n values, then call melt which will bring all these n values in a single column, finally drop any null rows, and sort the values by Col3 just to match it to the expected output you have in the question.

OUTPUT:

   Col1         Col2 Col3
0     1   QQ12345-01    x
3     1   QQ12345-02    x
6     1   QQ12345-03    x
1     2  QQ123456-01    y
4     2  QQ123456-02    y
2     3   QQ12345-01    z
5     3   QQ12345-02    z
8     3   QQ12345-03    z
like image 118
ThePyGuy Avatar answered Apr 04 '26 10:04

ThePyGuy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!