I have a database for eg.
Name Subset
Apple -AI-,-BI-A,-XC-,ZX-
Bat -po-,-IJ-,-IA-B
and want to convert it to:
Name Subset
Apple AI,BI,XC,ZX
Bat po,IJ,IA
i.e removing the first hyphen and removing the second hyphen and charachters following it until next comma appears.
Use Series.str.split, Series.explode and Series.groupby.agg:
In [2193]: df['Subset'] = df.Subset.str.split(',').explode().str.split('-').str[:2].apply(''.join).groupby(level=0).agg(','.join)
In [2194]: df
Out[2194]:
Name Subset
0 Apple AI,BI,XC,ZX
1 Bat po,IJ,IA
IIUC, use extractall with the -([^-]+)- regex to get only the values between dashes. Then GroupBy.agg with ','.join to concatenate the values:
df['Subset'] = (df['Subset'].str.extractall(r'-([^-]+)-')[0]
.groupby(level=0).agg(','.join)
)
output:
Name Subset
0 Apple AI,BI,XC
1 Bat po,IJ,IA
To handle the case where the left dash can be missing, use [-,]+([^-,]+)-:
df['Subset'] = (df['Subset'].str.extractall(r'[-,]+([^-,]+)-')[0]
.groupby(level=0).agg(','.join))
output:
Name Subset
0 Apple AI,BI,XC,ZX
1 Bat po,IJ,I
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With