Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace string values of column if contained in parentheses

I have the following dataframe as an example:

test = pd.DataFrame({'type':['fruit-of the-loom (sometimes-never)', 'yes', 'ok (not-possible) I will try', 'vegetable', 'poultry', 'poultry'],
                 'item':['apple', 'orange', 'spinach', 'potato', 'chicken', 'turkey']})

I found many posts of people wanting to remove parentheses from strings or similar situations, but in my case I would like to retain the string exactly as is, except I would like to remove the hyphen that is inside the parenthesis of the string.

Does anyone have a suggestion on how I could achieve this?

str.split() would take care of the hyphen if it was leading and str.rsplit() if it was trailing. I can't think of a way to engage this.

in this case the ideal outcome for the values in this hypothetical column would be:

'fruit-of the-loom (sometimes never)',
'yes', 
'ok (not possible) I will try', 
'vegetable', 
'poultry', 
'poultry'`

like image 237
bls Avatar asked Jan 20 '26 05:01

bls


2 Answers

One way could be to use str.replace with a pattern looking for what is between parenthesis, and the replace parameter could be a lambda using replace on the matching object:

print (test['type'].str.replace(pat='\((.*?)\)', 
                                repl=lambda x: x.group(0).replace('-',' ')))
0    fruit-of the-loom (sometimes never)
1                                    yes
2           ok (not possible) I will try
3                              vegetable
4                                poultry
5                                poultry
Name: type, dtype: object

Explanation of what is in pat= can be found here

like image 198
Ben.T Avatar answered Jan 21 '26 17:01

Ben.T


test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .sum(1)
             .combine_first(test.type))

Explanation:

  • Extract regex groups of beginning until parenthesis and then hyphen and after hyphen until parenthesis and then optional additional stuff
  • Concatenate them together again with sum
  • Where, NaN, use the values from the original (combine_first)

This way the hyphen is dropped, not replaced by a space. If you need a space you could use apply instead of sum:

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
             .combine_first(test.type))

Either way, this won't work for more than one set of parentheses.

like image 44
Josh Friedlander Avatar answered Jan 21 '26 17:01

Josh Friedlander