Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep only characters in between two hyphens from a dataframe cell which has comma separated list

I have a database for eg.

Name    Subset
Apple   -AI-,-BI-A,-XC-,ZX-
Bat     -po-,-IJ-,-IA-B

and want to convert it to:

Name    Subset
Apple   AI,BI,XC,ZX
Bat     po,IJ,IA

i.e removing the first hyphen and removing the second hyphen and charachters following it until next comma appears.

like image 442
spd Avatar asked Dec 11 '25 23:12

spd


2 Answers

Use Series.str.split, Series.explode and Series.groupby.agg:

In [2193]: df['Subset'] = df.Subset.str.split(',').explode().str.split('-').str[:2].apply(''.join).groupby(level=0).agg(','.join)

In [2194]: df
Out[2194]: 
    Name       Subset
0  Apple  AI,BI,XC,ZX
1    Bat     po,IJ,IA
like image 173
Mayank Porwal Avatar answered Dec 14 '25 11:12

Mayank Porwal


IIUC, use extractall with the -([^-]+)- regex to get only the values between dashes. Then GroupBy.agg with ','.join to concatenate the values:

df['Subset'] = (df['Subset'].str.extractall(r'-([^-]+)-')[0]
                .groupby(level=0).agg(','.join)
                )

output:

    Name    Subset
0  Apple  AI,BI,XC
1    Bat  po,IJ,IA

To handle the case where the left dash can be missing, use [-,]+([^-,]+)-:

df['Subset'] = (df['Subset'].str.extractall(r'[-,]+([^-,]+)-')[0]
                .groupby(level=0).agg(','.join))

output:

    Name       Subset
0  Apple  AI,BI,XC,ZX
1    Bat     po,IJ,I
like image 31
mozway Avatar answered Dec 14 '25 12:12

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!