Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a character (asterisk) in column values in r?

Tags:

variables

r

so I have a dataframe that looks like this but has 6k rows:

AWC, LocationID
333, *Yukon
485, *Lewis Rich
76, *Kodiak
666, Kodiak
54, *Rays

I would like to remove the asterisks from the LocationID values if thats possible and just keep the original name. So *Yukon -> Yukon. If thats not possible, could you help me with a way to rename a column value? I'm new to r.

like image 543
Juliet R Avatar asked Dec 02 '22 13:12

Juliet R


2 Answers

The stringr package has some very handy functions for vectorized string manipulation.

In the following code I replace the * with ''. Note that in R, literals inside the regex have to be preceded by double slashes \\ instead of the usual single slash \.

library(stringr) 
LocationID <- c('*Yukon','*Lewis Rich',  '*Kodiak', 'Kodiak', '*Rays')
AWC <- c(333, 485, 76, 666, 54)
df <- data.frame(LocationID, AWC)

df$location_clean <- stringr::str_replace(df$LocationID, '\\*', '')

Resulting in:

LocationID AWC location_clean
1      *Yukon 333          Yukon
2 *Lewis Rich 485     Lewis Rich
3     *Kodiak  76         Kodiak
4      Kodiak 666         Kodiak
5       *Rays  54           Rays
like image 64
Guilherme Marthe Avatar answered Dec 19 '22 05:12

Guilherme Marthe


This can be achieved using the mutate verb from the tidyverse package. Which in my opinion is more readable. So, to exemplify this, I create a dataset called DT with a focus on the LocationID to mimic the problem at hand.

library(tidyverse)
DT <- data.frame('AWC'= c(333, 485, 76, 666, 54), 
                 'LocationID'= c('*Yukon','*Lewis Rich', '*Kodiak', 'Kodiak', '*Rays'))

head(DT)
  AWC  LocationID
1 333      *Yukon
2 485 *Lewis Rich
3  76     *Kodiak
4 666      Kodiak
5  54       *Rays

In what follows, mutate allows one to alter the column content, gsub does the desired substitution (of * with ""), keeping the data cleaning flow followable.

DT <- DT %>% mutate(LocationID = gsub("\\*", "", LocationID))
head(DT)
  AWC LocationID
1 333      Yukon
2 485 Lewis Rich
3  76     Kodiak
4 666     Kodiak
5  54       Rays

NOTE that \\ is placed before * as the escape character

like image 28
odunayo12 Avatar answered Dec 19 '22 05:12

odunayo12