Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove words that appear in other column, Pandas

what is the procedure to remove a word from a string in one column column that occurs in the other column?

eg:

Sr       A              B                            C
1      jack        jack and jill                 and jill
2      run         you should run,               you should ,
3      fly         you shouldnt fly,there        you shouldnt ,there

It can be seen that I want column C, such that it is B minus contents of A. Please note the 3rd example, where fly is followed by a comma , so it should also take into consideration the punctuations (if the code is more towards detecting a space around it).
Column A can also have 2 words , so these need to be removed.
I need an expression in Pandas, something like:

df.apply(lambda x: x["C"].replace(r"\b"+x["A"]+r"\b", "").strip(), axis=1)
like image 710
Hypothetical Ninja Avatar asked Feb 14 '23 09:02

Hypothetical Ninja


1 Answers

How does this look?

In [24]: df
Out[24]: 
   Sr     A                       B
0   1  jack           jack and jill
1   2   run         you should run,
2   3   fly  you shouldnt fly,there

[3 rows x 3 columns]

In [25]: df.apply(lambda row: row.B.strip(row.A), axis=1)
Out[25]: 
0                 and jill
1          you should run,
2    ou shouldnt fly,there
dtype: object
like image 153
TomAugspurger Avatar answered Feb 16 '23 03:02

TomAugspurger