My data consists of two variables, an id and a corresponding name. The name can be two things. Either the id or a string of letters.
If there exists a non-numeric name, I need to replace any numeric names with this value.
Data example
df <- data.frame(id = c("100", "100", "101", "102", "103", "104", "104", "105", "100", "106"),
name = c("100", "A", "B", "C", "D", "104", "E", "F", "100", "106"),
correct_name = c("A", "A", "B", "C", "D", "E", "E", "F", "A", "106"), stringsAsFactors = F)
The third column gives the desired result.
I've been messing around with %in% and duplicated and group_by, but been unable to get anywhere.
EDIT: I missed a crucial part - there can be instances of a character name not existing. Updated the example - sorry!
EDIT
Since you have mentioned that there are certain id with no name to replace in such cases we can modify the ave option, check the condition and replace the values all in one call.
df$name <- with(df, ave(name, id, FUN = function(x) {
inds = grepl("[0-9]+", x)
if (any(!inds))
replace(x, inds, x[which.max(!inds)])
else
x
}))
df
# id name correct_name
#1 100 A A
#2 100 A A
#3 101 B B
#4 102 C C
#5 103 D D
#6 104 E E
#7 104 E E
#8 105 F F
#9 100 A A
#10 106 106 106
Original Answer
Assuming every id would have only one unique name, using dplyr we can do double replace first we change the names which has a number in it to NA and then replace those NAs with the first non-NA value in the group.
library(dplyr)
df %>%
group_by(id) %>%
mutate(name = replace(name, grepl("[0-9]+", name), NA),
name = replace(name, is.na(name), name[!is.na(name)][1]))
# id name correct_name
# <chr> <chr> <chr>
#1 100 A A
#2 100 A A
#3 101 B B
#4 102 C C
#5 103 D D
#6 104 E E
#7 104 E E
#8 105 F F
#9 100 A A
And using the same logic with base R ave
#Replace the numbers with NA
df$name[grepl("[0-9]+", df$name)] <- NA
#Change the NA's to first non-NA value in the group
df$name <- with(df,ave(name, id, FUN = function(x) x[!is.na(x)][1]))
Another option is to use tidyr fill in both the directions
library(tidyverse)
df %>%
mutate(name = replace(name, grepl("[0-9]+", name), NA)) %>%
group_by(id) %>%
fill(name) %>% #default direction is "down"
fill(name, .direction = "up")
# id name correct_name
# <chr> <chr> <chr>
#1 100 A A
#2 100 A A
#3 100 A A
#4 101 B B
#5 102 C C
#6 103 D D
#7 104 E E
#8 104 E E
#9 105 F F
PS - I just added stringsAsFactors = FALSE in your data.frame call to make the columns as character.
A solution with dplyr and the use of ifelse plus grepl with the pattern set to "\\d+" (ie: digits).
Edit: it's possible to have just one mutate:
df %>%
group_by(id) %>%
mutate(namenew = ifelse(
grepl("\\d+", name), # match for digits in the string
name[!grepl("\\d+", name)][1], # if TRUE, substitute with the first non-digit
name # if FALSE, keep it
))
# id name correct_name namenew
# 1 100 100 A A
# 2 100 A A A
# 3 101 B B B
# 4 102 C C C
# 5 103 D D D
# 6 104 104 E A
# 7 104 E E E
# 8 105 F F F
# 9 100 100 A A
Maybe more clear of what's happening compared to my solution above. (Similar to @Ronak Shah)
library(dplyr)
df %>%
group_by(id) %>%
mutate(namenew = ifelse(
grepl("\\d+", name),
NA,
name
)) %>%
mutate(namenew = ifelse(
is.na(namenew),
namenew[!is.na(namenew)][1],
namenew
))
# id name correct_name namenew
# 1 100 100 A A
# 2 100 A A A
# 3 101 B B B
# 4 102 C C C
# 5 103 D D D
# 6 104 104 E A
# 7 104 E E E
# 8 105 F F F
# 9 100 100 A A
Data (stringsAsFactors is important):
df <- data.frame(id = c("100", "100", "101", "102", "103", "104", "104", "105", "100"),
name = c("100", "A", "B", "C", "D", "104", "E", "F", "100"),
correct_name = c("A", "A", "B", "C", "D", "E", "E", "F", "A"), stringsAsFactors = F)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With