Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I replace values within a data frame with a string in R?

Tags:

string

replace

r

na

short version: How do I replace values within a data frame with a string found within another data frame?

longer version: I'm a biologist working with many species of bees. I have a data set with many thousands of bees. Each row has a unique bee ID # along with all the relevant info about that specimen (data of capture, GPS location, etc). The species information for each bee has not been entered because it takes a long time to ID them. When IDing, I end up with boxes of hundred of bees, all of the same species. I enter these into a separate data frame. I am trying to write code that will update the original data file with species information (family, genus, species, sex, etc) as I ID the bees. Currently, in the original data file, the species info is blank and is interpreted as NA within R. I want to have R find all unique bee ID #'s and fill in the species info, but I am having trouble figuring out how to replace the NA values with a string (e.g. "Andrenidae")

Here is a simple example of what I am trying to do:

rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
speciesInfo<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))

rawData[rawData$beeID == 4,"family"]  <- speciesInfo[speciesInfo$beeID == 4,"family"]

So, I am replacing things as I want, but with a number rather than the family name (a string). What I would eventually like to do is write a little loop to add in all the species info, e.g.:

for (i in speciesInfo$beeID){
  rawData[rawData$beeID == i,"family"]  <- speciesInfo[speciesInfo$beeID == i,"family"]
}

Thanks in advance for any advice!

Cheers,

Zak

EDIT:

I just noticed that the first two methods below add a new column each time, which would cause problems if I needed to add species info multiple times (which I typically do). For example:

rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
Andrenidae<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))
Halictidae<-data.frame(beeID=seq(1,20,3)+1,family=rep("Halictidae",7))

# using join
library(plyr)
rawData <- join(rawData, Andrenidae, by = "beeID", type = "left")
rawData <- join(rawData, Halictidae, by = "beeID", type = "left")

# using merge
rawData <- merge(x=rawData,y=Andrenidae,by='beeID',all.x=T,all.y=F)
rawData <- merge(x=rawData,y=Halictidae,by='beeID',all.x=T,all.y=F)

Is there a way to either collapse the columns so that I have one, unified data frame? Or a way to update the rawData rather than adding a new column each time? Thanks in advance!

like image 402
Arturito Avatar asked Sep 11 '12 13:09

Arturito


People also ask

How do you substitute values in R?

To replace a column value in R use square bracket notation df[] , By using this you can update values on a single column or on all columns. To refer to a single column use df$column_name .

How do I replace values in a column in a DataFrame in R?

In this article, we will see how to replace specific values in a column of DataFrame in R Programming Language. Method 1: Using Replace() function. replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values.

How do I replace a character in a DataFrame in R?

How to replace a single character in a string on the R DataFrame column (find and replace)? To replace a first or all occurrences of a single character in a string use gsub(), sub(), str_replace(), str_replace_all() and functions from dplyr package of R.

Is there a Replace function in R?

Replacing values in a data frame is a very handy option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values with appropriate to clear up large datasets for analysis.


1 Answers

Here is a function I think will work for you. This uses match to find and index of values in your annotation dataframe, and then replaces the values in the rawData.

replaceID <- function(to,from,mergeBy,values){
  x <- match(from[,mergeBy],to[,mergeBy])
  to[,values][x] <- as.character(from[,values])
  return(to)
}
> rawData <- replaceID(rawData,Halictidae,"beeID","family")
> rawData
   beeID     family
1      1       <NA>
2      2 Halictidae
3      3       <NA>
4      4       <NA>
5      5 Halictidae
6      6       <NA>
7      7       <NA>
8      8 Halictidae
9      9       <NA>
10    10       <NA>
11    11 Halictidae
12    12       <NA>
13    13       <NA>
14    14 Halictidae
15    15       <NA>
16    16       <NA>
17    17 Halictidae
18    18       <NA>
19    19       <NA>
20    20 Halictidae
like image 86
Matt Shirley Avatar answered Oct 25 '22 19:10

Matt Shirley