I have a data frame that looks like this:
> sample
# A tibble: 6 x 10
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <chr>
1 1 2 3 4 8 NA NA NA NA orioles
2 1 2 3 4 9 13 NA NA NA nationals
3 1 2 3 5 10 14 16 18 NA dodgers
4 1 2 3 5 10 14 17 19 NA cardinals
5 1 2 3 6 11 NA NA NA NA giants
6 1 2 3 7 12 15 NA NA NA padres
What I'd like to do is concatenate the Supplier column with any Level column if all the values between them are NA
. Another way I was thinking about this was that if the column to the right of the Level column is NA
then to concatenate that column with the supplier column.
I was thinking a for loop but I haven't figured out how to implement the logic. The logic I was thinking is something like:
for (level in levels) {
if is.na(level n + 1) {
paste0(level, Supplier)
}
else {
level}
}
I could also do a bunch of mutate
calls like this but it seems super repetitive and unnecessary:
sample %>%
mutate(
Level_5 = ifelse(
is.na(Level_6),
paste0(Supplier, "<br>", Level_5),
Level_5)
)
Here's the dput of the data:
structure(list(Level_1 = c(1, 1, 1, 1, 1, 1), Level_2 = c(2,
2, 2, 2, 2, 2), Level_3 = c(3, 3, 3, 3, 3, 3), Level_4 = c(4,
4, 5, 5, 6, 7), Level_5 = c(8, 9, 10, 10, 11, 12), Level_6 = c(NA,
13, 14, 14, NA, 15), Level_7 = c(NA, NA, 16, 17, NA, NA), Level_8 = c(NA,
NA, 18, 19, NA, NA), Level_9 = c(NA, NA, NA, NA, NA, NA), Supplier = c("orioles",
"nationals", "dodgers", "cardinals", "giants", "padres")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
To be honest, I'm not 100% sure about your desired output. Using dplyr
and tidyr
:
library(tidyr)
library(dplyr)
sample %>%
pivot_longer(cols=starts_with("Level_"), names_prefix="Level_", names_to="level") %>%
drop_na() %>%
group_by(Supplier) %>%
mutate(new_val=ifelse(level==max(level), paste0(Supplier, "<br>", value), value)) %>%
select(-value) %>%
pivot_wider(names_from=level, names_prefix="Level_", values_from=new_val)
returns
# A tibble: 6 x 9
# Groups: Supplier [6]
Supplier Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 orioles 1 2 3 4 orioles<br>8 NA NA NA
2 nationals 1 2 3 4 9 nationals<br>13 NA NA
3 dodgers 1 2 3 5 10 14 16 dodgers<br>18
4 cardinals 1 2 3 5 10 14 17 cardinals<br>19
5 giants 1 2 3 6 giants<br>11 NA NA NA
6 padres 1 2 3 7 12 padres<br>15 NA NA
I lost the Level_9 column since it contained only NA
. You can easily add it again.
Final Update
I realized my mistake on trying to find the max value in every row and replace it with desired concatenated string. So I came up with another solution which only replaces the last non-NA
value (it can also be not the max values of the row), given all values are not numeric. So here is my final solution:
library(dplyr)
library(stringr)
library(purrr)
df %>%
pmap_dfr(., ~ {x <- c(...)[-10][!is.na(c(...)[-10])];
ind <- which(c(...) == x[length(x)]);
replace(c(...), ind[length(ind)], str_c(..10, x[length(x)], sep = "_"))}
)
# A tibble: 6 x 10
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 2 3 4 orioles_8 NA NA NA NA orioles
2 1 2 3 4 9 nationals_13 NA NA NA nationals
3 1 2 3 5 10 14 16 dodgers_18 NA dodgers
4 1 2 3 5 10 14 17 cardinals_19 NA cardinals
5 1 2 3 6 giants_11 NA NA NA NA giants
6 1 2 3 7 12 padres_15 NA NA NA padres
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With