I have a column in a dataframe like this:
npt2$name
# [1] "Andreas Groll, M.D."
# [2] ""
# [3] "Pan-Chyr Yang, PHD"
# [4] "Suh-Fang Jeng, Sc.D"
# [5] "Mostafa K Mohamed Fontanet Arnaud"
# [6] "Thomas Jozefiak, M.D."
# [7] "Medical Monitor"
# [8] "Qi Zhu, MD"
# [9] "Holly Posner"
# [10] "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD"
# [11] "Lance A Mynderse, M.D."
# [12] "Lawrence Currie, MD"
I tried gsub
but with no luck.
After doing toupper(x)
I need to replace all instances of 'MD' or 'M.D.' or 'PHD' with nothing.
Is there a nice short trick to do it?
In fact I would be interested to see it done on a single string and how differently it is done in one command on the whole list.
If you want to replace the string of elements of a list, use the string method replace() for each element with the list comprehension. If there is no string to be replaced, applying replace() will not change it, so you don't need to select an element with if condition .
One way that we can do this is by using a for loop. One of the key attributes of Python lists is that they can contain duplicate values. Because of this, we can loop over each item in the list and check its value. If the value is one we want to replace, then we replace it.
replace(/cat/gi, "dog"); // now str = "I have a dog, a dog, and a goat." str = str. replace(/dog/gi, "goat"); // now str = "I have a goat, a goat, and a goat." str = str. replace(/goat/gi, "cat"); // now str = "I have a cat, a cat, and a cat."
Either of these:
gsub("MD|M\\.D\\.|PHD", "", test) # target specific strings
gsub("\\,.+$", "", test) # target all characters after comma
Both Matt Parker above and Tommy below have raised the question whether 'M.R.C.P.', 'PhD', 'D.Phil.' and 'Ph.D.' or other British or Continental designations of doctorate level degrees should be sought out and removed. Perhaps @user56 can advise what the intent was.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With