Let's say I have a data frame with several rows like the following:
df <- data.frame(a = c(NA,20,NA),
date1 = c("2016-03-01", "2016-02-01", "2016-02-01"),
b = c(50,NA, NA),
date2 = c("2016-02-01", "2016-03-01", "2016-03-01"),
c = c(10,10, 10),
date3 = c("2016-01-01","2016-01-01", "2016-01-01"))
For each row, I want to get the latest value which is not a NA
between a
, b
, and c
according to the dates
(so I respectively look at date1
, date2
, or date3
and pick the most recent one).
Basically, date1
gives the date corresponding to the value a
,
date2
gives the date corresponding to the value b
,
date3
gives the date corresponding to the value c
.
If date1 > date2
& date1 > date3
, I will want to take the value a
However, if the value a
is NA
(which is the case in my example), I will compare date2
and date3
. In my example, date2 > date3
, and since value b
is not NA
but 50
, I will take 50
as my final result.
Now I want to do this for all the rows in my dataframe
Since I am using dplyr
, I tried to use the case_when
function by using the rank function (in my example, I look a the first ranked date, and then look at the linked value. If it is a NA, I look at the 2nd best ranked, etc...)
However, I can't just put, as I'd like to do, :
df <- df %>%
mutate(result = case_when(is.na(a) & is.na(b) & is.na(c) ~ NA_integer_,
rev(rank(date1, date2, date3))[1] == 3 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 3 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 3 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 2 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 2 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 2 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 1 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 1 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 1 & !is.na(a) ~ c))
Because the rank
function needs a unique vector as argument (but I can't put c(date1, date2, date3)
neither because it would give me the whole order of this vector and not the rank for each row)
In my example the result I would like to have would be
res
a date1 b date2 c date3 result
NA 2016-03-01 50 2016-02-01 10 2016-01-01 50
20 2016-02-01 NA 2016-03-01 10 2016-01-01 20
NA 2016-02-01 NA 2016-03-01 10 2016-01-01 10
Does anyone have an idea or even an entirely different approach to this problem ?
Summary: in this tutorial, you will learn how to use SQL RANK () function to find the rank of each row in the result set. The RANK () function is a window function that assigns a rank to each row in the partition of a result set.
The following code shows how to select rows where the value in a certain column belongs to a list of values: #select rows where team is equal to 'A' or 'C' df [df$team %in% c ('A', 'C'), ]
Even though more than one row can have the same rank, the next row will get the next rank. For example, if two rows are 5th, the next row will be 6th. The ROW_NUMBER () function is different.
It depends on many factors, and you can’t predict which row will get which rank. These are the basic and (probably) most commonly used SQL ranking functions. If you want to learn about other ranking functions, you can read this overview of ranking functions.
I suggest converting to long-format and computing the relevant values. If you want, you can then add the results to your original data.frame. Here's how you could do that using data.table:
library(data.table)
setDT(df) # convert to data.table object
df[, row := .I] # add a row-id
dflong <- melt(df, id = "row", measure = patterns("^date", "^(a|b|c)"),
na.rm = TRUE) # convert to long format
setorder(dflong, value1) # reorder by date value
dflong <- unique(dflong, by = "row", fromLast = TRUE) # get the latest dates
df[dflong, result := i.value2, on = "row"] # add result to original data
df
# a date1 b date2 c date3 row result
#1: NA 2016-03-01 50 2016-02-01 10 2016-01-01 1 50
#2: 20 2016-02-01 NA 2016-03-01 10 2016-01-01 2 20
#3: NA 2016-02-01 NA 2016-03-01 10 2016-01-01 3 10
This ought to handle it. First we put the data in tidy form (1 row for each Date, Value, along with a row_num to identify which example the tidy row belongs to). Then we filter out NAs, group_by row_num, order by Date descending, and take the first row.
df %>%
mutate(row_num = row_number()) %>%
unite(a, a, date1) %>%
unite(b, b, date2) %>%
unite(c, c, date3) %>%
gather(key, value, -row_num) %>%
select(-key) %>%
separate(value, into=c("Value", "Date"), sep = "_") %>%
mutate(Date = as.Date(Date)) %>%
filter(Value != "NA") %>%
group_by(row_num) %>%
top_n(1, Date) %>%
ungroup()
Here is one way to do it...
df$result <- apply(df, 1, function(x){
dates <- as.Date(x[seq(2, length(x), 2)])
values <- x[seq(1,length(x),2)]
return(values[!is.na(values)][which.max(dates[!is.na(values)])])
})
df
a date1 b date2 c date3 result
1 NA 2016-03-01 50 2016-02-01 10 2016-01-01 50
2 20 2016-02-01 NA 2016-03-01 10 2016-01-01 20
3 NA 2016-02-01 NA 2016-03-01 10 2016-01-01 10
Here one more approach:
df$row <- 1:nrow(df)
gather(df, key, date_val, date1, date2, date3, -row) %>%
select(-key) %>%
gather(key, val, a,b,c) %>%
filter(!is.na(val)) %>%
group_by(row) %>%
mutate(max_date = max(date_val)) %>%
filter(date_val == max_date) %>% summarise(result = max(val)) %>%
left_join(df, by="row") %>% select(-row)
# A tibble: 3 × 7
result a date1 b date2 c date3
<dbl> <dbl> <fctr> <dbl> <fctr> <dbl> <fctr>
1 50 NA 2016-03-01 50 2016-02-01 10 2016-01-01
2 20 20 2016-02-01 NA 2016-03-01 10 2016-01-01
3 10 NA 2016-02-01 NA 2016-03-01 10 2016-01-01
Another base
alternative:
df$id <- 1:nrow(df)
d2 <- reshape(df, varying = list(seq(1, by = 2, len = (ncol(df) - 1)/2),
seq(2, by = 2, len = (ncol(df) - 1)/2)),
direction = "long")
d2 <- with(d2, d2[order(-id, date1, decreasing = TRUE), ])
cbind(df, res = tapply(d2$a[!is.na(d2$a)], d2$id[!is.na(d2$a)], `[`, 1))
# a date1 b date2 c date3 id res
# 1 NA 2016-03-01 50 2016-02-01 10 2016-01-01 1 50
# 2 20 2016-02-01 NA 2016-03-01 10 2016-01-01 2 20
# 3 NA 2016-02-01 NA 2016-03-01 10 2016-01-01 3 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With