so I have a dataframe that looks like this but has 6k rows:
AWC, LocationID
333, *Yukon
485, *Lewis Rich
76, *Kodiak
666, Kodiak
54, *Rays
I would like to remove the asterisks from the LocationID values if thats possible and just keep the original name. So *Yukon -> Yukon. If thats not possible, could you help me with a way to rename a column value? I'm new to r.
The stringr
package has some very handy functions for vectorized string manipulation.
In the following code I replace the *
with ''
. Note that in R, literals inside the regex have to be preceded by double slashes \\
instead of the usual single slash \
.
library(stringr)
LocationID <- c('*Yukon','*Lewis Rich', '*Kodiak', 'Kodiak', '*Rays')
AWC <- c(333, 485, 76, 666, 54)
df <- data.frame(LocationID, AWC)
df$location_clean <- stringr::str_replace(df$LocationID, '\\*', '')
Resulting in:
LocationID AWC location_clean
1 *Yukon 333 Yukon
2 *Lewis Rich 485 Lewis Rich
3 *Kodiak 76 Kodiak
4 Kodiak 666 Kodiak
5 *Rays 54 Rays
This can be achieved using the mutate
verb from the tidyverse
package. Which in my opinion is more readable. So, to exemplify this, I create a dataset called DT
with a focus on the LocationID
to mimic the problem at hand.
library(tidyverse)
DT <- data.frame('AWC'= c(333, 485, 76, 666, 54),
'LocationID'= c('*Yukon','*Lewis Rich', '*Kodiak', 'Kodiak', '*Rays'))
head(DT)
AWC LocationID
1 333 *Yukon
2 485 *Lewis Rich
3 76 *Kodiak
4 666 Kodiak
5 54 *Rays
In what follows, mutate
allows one to alter the column content, gsub
does the desired substitution (of *
with ""
), keeping the data cleaning flow followable.
DT <- DT %>% mutate(LocationID = gsub("\\*", "", LocationID))
head(DT)
AWC LocationID
1 333 Yukon
2 485 Lewis Rich
3 76 Kodiak
4 666 Kodiak
5 54 Rays
NOTE that
\\
is placed before*
as the escape character
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With