I have a dataframe, one column of which is names. In a later phase of analysis, I will need to merge with other data by this name column, and there are a few names which vary by source. I'd like to clean up my names using a hash (map) of names->cleaned names. I've found several references to using R lists as hashes (e.g., this question on SE), but I can't figure out how to extract values for keys in a vector only as they occur. So for example,
> players=data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))
> xref = c("Bob"="Robert", "Fred Jr." = "Fred")
> players$names
[1] Joe John Bob
Levels: Bob Joe John
Whereas players$names
gives a vector of names from the original frame, I need the same vector, only with any values that occur in xref
replaced with their equivalent (lookup) values; my desired result is the vector Joe John Robert
.
The closest I've come is:
> players$names %in% names(xref)
[1] FALSE FALSE TRUE
Which correctly indicates that only "Bob" in players$names
exists in the "keys" (names) of xref
, but I can't figure out how to extract the value for that name and combine it with the other names in the vector that don't belong to xref
as needed.
note: in case it's not completely clear, I'm pretty new to R, so if I'm approaching this in the wrong fashion, I'm happy to be corrected, but my core issue is essentially as stated: I need to clean up some incoming data within R by replacing some incoming values with known replacements and keeping all other values; further, the map of original->replacement should be stored as data (like xref), not as code.
ifelse
is an even more straightforward solution, in the case that xref is a named vector and not a list.
players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8), stringsAsFactors = FALSE)
xref <- c("Bob" = "Robert", "Fred Jr." = "Fred")
players$clean <- ifelse(is.na(xref[players$names]), players$names, xref[players$names])
players
Result
names scores clean
1 Joe 9.8 Joe
2 John 9.9 John
3 Bob 8.8 Robert
If xref is a list, then sapply
function can be used to do conditional look-ups
players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))
xref <- list("Bob" = "Robert", "Fred Jr." = "Fred")
players$clean <- sapply(players$names, function(x) ifelse( x %in% names(xref), xref[x], as.vector(x)) )
players
Result
> players
names scores clean
1 Joe 9.8 Joe
2 John 9.9 John
3 Bob 8.8 Robert
You can replace the factor levels with the desired text. Here's an example which loops through xref
and does the replacement:
for (n in names(xref)) {
levels(players$names)[levels(players$names) == n ] <- xref[n]
}
players
## names scores
## 1 Joe 9.8
## 2 John 9.9
## 3 Robert 8.8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With