Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a value in a data frame based on a conditional (`if`) statement

Tags:

r

recode

In the R data frame coded for below, I would like to replace all of the times that B appears with b.

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12])
colnames(junk) <- c("nm", "val")

this provides:

   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   f
7   C   g
8   D   h
9   A   i
10  B   j
11  C   k
12  D   l

My initial attempt was to use a for and if statements like so:

for(i in junk$nm) if(i %in% "B") junk$nm <- "b"

but as I am sure you can see, this replaces ALL of the values of junk$nm with b. I can see why this is doing this but I can't seem to get it to replace only those cases of junk$nm where the original value was B.

NOTE: I managed to solve the problem with gsub but in the interest of learning R I still would like to know how to get my original approach to work (if it is possible)

like image 471
DQdlM Avatar asked Apr 28 '11 19:04

DQdlM


People also ask

How do you replace values in a Dataframe in R based on condition?

Replace column values based on checking logical conditions in R DataFrame is pretty straightforward. All you need to do is select the column vector you wanted to update and use the condition within [] .

How do I replace specific values in R?

replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values. It takes on three parameters first is the list name, then the index at which the element needs to be replaced, and the third parameter is the replacement values.

How do I replace a value with 0 in R?

To replace zero with previous value in an R data frame column, we can use na.

How do I replace values in multiple columns in R?

Use R dplyr::coalesce() to replace NA with 0 on multiple dataframe columns by column name and dplyr::mutate_at() method to replace by column name and index. tidyr:replace_na() to replace. Using these methods and packages you can also replace NA with an empty string in R dataframe.


8 Answers

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)
like image 163
diliop Avatar answered Oct 03 '22 15:10

diliop


another useful way to replace values

library(plyr)
junk$nm <- revalue(junk$nm, c("B"="b"))
like image 25
Oriol Prat Avatar answered Oct 03 '22 15:10

Oriol Prat


Short answer is:

junk$nm[junk$nm %in% "B"] <- "b"

Take a look at Index vectors in R Introduction (if you don't read it yet).


EDIT. As noticed in comments this solution works for character vectors so fail on your data.

For factor best way is to change level:

levels(junk$nm)[levels(junk$nm)=="B"] <- "b"
like image 26
Marek Avatar answered Oct 03 '22 16:10

Marek


As the data you show are factors, it complicates things a little bit. @diliop's Answer approaches the problem by converting to nm to a character variable. To get back to the original factors a further step is required.

An alternative is to manipulate the levels of the factor in place.

> lev <- with(junk, levels(nm))
> lev[lev == "B"] <- "b"
> junk2 <- within(junk, levels(nm) <- lev)
> junk2
   nm val
1   A   a
2   b   b
3   C   c
4   D   d
5   A   e
6   b   f
7   C   g
8   D   h
9   A   i
10  b   j
11  C   k
12  D   l

That is quite simple and I often forget that there is a replacement function for levels().

Edit: As noted by @Seth in the comments, this can be done in a one-liner, without loss of clarity:

within(junk, levels(nm)[levels(nm) == "B"] <- "b")
like image 42
Gavin Simpson Avatar answered Oct 03 '22 16:10

Gavin Simpson


The easiest way to do this in one command is to use which command and also need not to change the factors into character by doing this:

junk$nm[which(junk$nm=="B")]<-"b"
like image 25
user1021713 Avatar answered Oct 03 '22 15:10

user1021713


If you are working with character variables (note that stringsAsFactors is false here) you can use replace:

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12], stringsAsFactors = FALSE)
colnames(junk) <- c("nm", "val")

junk$nm <- replace(junk$nm, junk$nm == "B", "b")
junk
#    nm val
# 1   A   a
# 2   b   b
# 3   C   c
# 4   D   d
# ...
like image 26
loki Avatar answered Oct 03 '22 15:10

loki


You have created a factor variable in nm so you either need to avoid doing so or add an additional level to the factor attributes. You should also avoid using <- in the arguments to data.frame()

Option 1:

junk <- data.frame(x = rep(LETTERS[1:4], 3), y =letters[1:12], stringsAsFactors=FALSE)
junk$nm[junk$nm == "B"] <- "b"

Option 2:

levels(junk$nm) <- c(levels(junk$nm), "b")
junk$nm[junk$nm == "B"] <- "b"
junk
like image 26
IRTFM Avatar answered Oct 03 '22 15:10

IRTFM


You can use ifelse too, which is very simple to understand

junk$val <- ifelse(junk$nm == "B", "b", junk$val)

If you still want to do it through for loop the correct way of doing it

for(i in 1:nrow(junk)){
  if(junk[i, "nm"] == "B"){
    junk[i, "val"] <- "b"
  }
}

junk
> junk
   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   b
7   C   g
8   D   h
9   A   i
10  B   b
11  C   k
12  D   l
like image 36
AnilGoyal Avatar answered Oct 03 '22 17:10

AnilGoyal