Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing vector values in R based on a list (hash)

Tags:

r

I have a dataframe, one column of which is names. In a later phase of analysis, I will need to merge with other data by this name column, and there are a few names which vary by source. I'd like to clean up my names using a hash (map) of names->cleaned names. I've found several references to using R lists as hashes (e.g., this question on SE), but I can't figure out how to extract values for keys in a vector only as they occur. So for example,

> players=data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))
> xref = c("Bob"="Robert", "Fred Jr." = "Fred")
> players$names
[1] Joe  John Bob 
Levels: Bob Joe John

Whereas players$names gives a vector of names from the original frame, I need the same vector, only with any values that occur in xref replaced with their equivalent (lookup) values; my desired result is the vector Joe John Robert.

The closest I've come is:

> players$names %in% names(xref)
[1] FALSE FALSE  TRUE

Which correctly indicates that only "Bob" in players$names exists in the "keys" (names) of xref, but I can't figure out how to extract the value for that name and combine it with the other names in the vector that don't belong to xref as needed.

note: in case it's not completely clear, I'm pretty new to R, so if I'm approaching this in the wrong fashion, I'm happy to be corrected, but my core issue is essentially as stated: I need to clean up some incoming data within R by replacing some incoming values with known replacements and keeping all other values; further, the map of original->replacement should be stored as data (like xref), not as code.

like image 677
Jason Clark Avatar asked Feb 13 '23 13:02

Jason Clark


2 Answers

Updated answer: ifelse

ifelse is an even more straightforward solution, in the case that xref is a named vector and not a list.

players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8), stringsAsFactors = FALSE)
xref <- c("Bob" = "Robert", "Fred Jr." = "Fred")

players$clean <- ifelse(is.na(xref[players$names]), players$names, xref[players$names])

players

Result

   names scores  clean
1   Joe    9.8    Joe
2  John    9.9   John
3   Bob    8.8 Robert

Previous answer: sapply

If xref is a list, then sapply function can be used to do conditional look-ups

players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))

xref <- list("Bob" = "Robert", "Fred Jr." = "Fred")

players$clean <- sapply(players$names, function(x) ifelse( x %in% names(xref), xref[x], as.vector(x)) )

players

Result

> players
  names scores  clean
1   Joe    9.8    Joe
2  John    9.9   John
3   Bob    8.8 Robert
like image 175
Damian Avatar answered Feb 16 '23 02:02

Damian


You can replace the factor levels with the desired text. Here's an example which loops through xref and does the replacement:

for (n in names(xref)) {
  levels(players$names)[levels(players$names) == n ] <- xref[n]
}

players
##    names scores
## 1    Joe    9.8
## 2   John    9.9
## 3 Robert    8.8
like image 21
Matthew Lundberg Avatar answered Feb 16 '23 03:02

Matthew Lundberg