Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing data values based on grep result in R

Tags:

r

I have a data frame. One of the columns has values like:

WIND
WINDS
HIGH WIND
etc

among the other values. Now I want to rename every value that has some variation of "WIND" in it, with "WIND". I know how to find values that I need to replace:

grep("WIND", df$col1)

but not how to replace those values. Thanks.

like image 392
user1754606 Avatar asked Feb 28 '14 16:02

user1754606


People also ask

How do I replace specific values in R?

replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values. It takes on three parameters first is the list name, then the index at which the element needs to be replaced, and the third parameter is the replacement values.

What does replace in R do?

Wrapping up. Replacing values in a data frame is a very handy option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values with appropriate to clear up large datasets for analysis.


2 Answers

You can just subset the original column for these values by using grepl and replace

df$col1[grepl("WIND",df$col1)]<-"WIND"
like image 182
Steve Reno Avatar answered Sep 30 '22 01:09

Steve Reno


UPDATE: a bit of a brainfart, agrep actually doesn't add anything here over grep, but you can just replace the agrep with grep. It does if you have some words that have roots that vary slightly but you still want to match.

Here is an approach using agrep:

> wind.vec
[1] "WINDS"      "HIGH WIND"  "WINDY"      "VERY WINDY"
> wind.vec[agrep("WIND", wind.vec)] <- "WIND"
> wind.vec
[1] "WIND" "WIND" "WIND" "WIND"

The nice thing about agrep is it matches approximately, so "WINDY" is replaced. Note I'm doing this with a vector, but you can easily extend to a data frame by replacing wind.vec with my.data.frame$my.wind.col.

agrep returns the indices that match approximately, which then allows me to use the [<- replacement operator to replace the approximately matching values with "WIND".

like image 28
BrodieG Avatar answered Sep 30 '22 02:09

BrodieG