Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

changing all values in one column in a filtered data.frame in R

Tags:

r

dplyr

stringr

I have a very messy data frame, with one column with values that are understandable to humans but not to computers, a bit like the one below.

df<-data.frame("id"=c(1:10), 
           "colour"=c("re d", ", red", "re-d","green", "gre, en", ", gre-en",  "blu e", "green", ", blue", "bl ue"))

I can filter the df with str_detect

df %>% filter(str_detect(tolower(colour), pattern = "gr")) 

But I want to rename all the filtered results to the same value so I can wrangle it.
Any suggestions?
I tried to separate with pattern but was unsuccessful.

EDIT: Not all . and spaces are unnecessary in the df that I am working with. Lets say that the correct way of writing green in the made up df is "gr. een".

EDIT2:
Wanted result with faked spelling of colours just to get an idea:

id     colour
1      r. ed
2      r. ed
3      r. ed
4      gr. een
6      gr. een
7      gr. een
8      blu. e
9      gr. een           
10     blu. e
like image 888
Mactilda Avatar asked Dec 21 '18 11:12

Mactilda


People also ask

How do I replace all values in a column in R?

To replace a column value in R use square bracket notation df[] , By using this you can update values on a single column or on all columns. To refer to a single column use df$column_name .

How do I filter data based on values of a column in R?

Column values can be subjected to constraints to filter and subset the data. The conditions can be combined by logical & or | operators. The %in% operator is used here, in order to check values that match to any of the values within a specified vector.

How do I filter a column with multiple values in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.


1 Answers

You can use mgsub to replace multiple words with multiple patterns:

df<-data.frame("id"=c(1:10), 
               "colour"=c("re d", ", red", "re-d","green", "gre, en", 
                          ", gre-en",  "blu e", "green", ", blue", "bl ue"))

library(textclean)

df$colour = mgsub(df$colour, 
                  pattern =  c(".*gr.*", ".*re.*", ".*bl.*"), 
                  replacement =  c("gr. een", "r. ed", "blu. e"), fixed = F)

df

#     id  colour
# 1   1   r. ed
# 2   2   r. ed
# 3   3   r. ed
# 4   4 gr. een
# 5   5 gr. een
# 6   6 gr. een
# 7   7  blu. e
# 8   8 gr. een
# 9   9  blu. e
# 10 10  blu. e
like image 79
AntoniosK Avatar answered Sep 23 '22 10:09

AntoniosK