After combining several columns with tidyr::unite()
, NAs from missing data remain in my character vector, which I do not want.
I have a series of medical diagnoses per row (1 per column) and would like to benchmark searching for a series of codes via. %in%
and grepl()
.
There is an open issue on Github on this problem, is there any movement - or work arounds? I would like to keep the vector comma-separated.
Here is a representative example:
library(dplyr)
library(tidyr)
df <- data_frame(a = paste0("A.", rep(1, 3)), b = " ", c = c("C.1", "C.3", " "), d = "D.4", e = "E.5")
cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",")
Current output:
# # A tibble: 3 x 3
# a new e
# <chr> <chr> <chr>
# 1 A.1 NA,C.1,D.4 E.5
# 2 A.1 NA,C.3,D.4 E.5
# 3 A.1 NA,NA,D.4 E.5
Desired output:
# # A tibble: 3 x 3
# a new e
# <chr> <chr> <chr>
# 1 A.1 C.1,D.4 E.5
# 2 A.1 C.3,D.4 E.5
# 3 A.1 D.4 E.5
In the new tidyr
, you can now use na.rm
parameter to remove NA
values.
library(tidyr)
library(dplyr)
df %>% unite(new, cols, sep = ",", na.rm = TRUE)
# a new e
# <chr> <chr> <chr>
#1 A.1 C.1,D.4 E.5
#2 A.1 C.3,D.4 E.5
#3 A.1 D.4 E.5
However, NA
s would not be removed if have columns are factors. We need to change them to character before using unite
.
df %>%
mutate_all(as.character) %>%
unite(new, cols, sep = ",", na.rm = TRUE)
You could also use base R apply
method for the same.
apply(df[cols], 1, function(x) toString(na.omit(x)))
#[1] "C.1, D.4" "C.3, D.4" "D.4"
data
df <- data_frame(
a = c("A.1", "A.1", "A.1"),
b = c(NA_character_, NA_character_, NA_character_),
c = c("C.1", "C.3", NA),
d = c("D.4", "D.4", "D.4"),
e = c("E.5", "E.5", "E.5")
)
cols <- letters[2:4]
You could use regex to remove the NAs after they are created:
library(dplyr)
library(tidyr)
df <- data_frame(a = paste0("A.", rep(1, 3)),
b = " ",
c = c("C.1", "C.3", " "),
d = "D.4", e = "E.5")
cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",") %>%
dplyr::mutate(new = stringr::str_replace_all(new, 'NA,?', '')) # New line
Output:
# A tibble: 3 x 3
a new e
<chr> <chr> <chr>
1 A.1 C.1,D.4 E.5
2 A.1 C.3,D.4 E.5
3 A.1 D.4 E.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With