Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get latest record but if the latest is empty then get the last latest record

Tags:

r

dplyr

I'm having issue trying to convert rows into columns and then getting only the latest record that has a entry (using timestamp). Here is my data set:

enter image description here

df <- data.frame(id = c("123||wa", "123||wa", "123||wa", "223||sa", "223||sa", "223||sa", "123||wa"),
             questions = c("dish", "car", "house", "dish", "house", "car", "dish"),
             answers = c(" ", "bmw", "yes", "pizza", "yes", "audi","ravioli" ), 
             timestamp = c("24JUN2018:12:24:16", "07JAN2015:15:22:54", "24MAR2018:12:24:16", "24MAR2018:12:24:16",
                           "04AUG2014:12:40:30", "03JUL2014:15:38:11", "03JUL2014:15:38:11"))

The desired output is:

enter image description here

So far I have the below code which does most of the job.

library(lubridate)
library(dplyr)

df %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)

The only thing missing is that it always pulls out the latest record even if it's empty. How can I change it to only include the latest entry record and not to pull blanks. Thanks

The output produced by the above code is:
enter image description here

like image 874
SAJ Avatar asked Mar 05 '23 11:03

SAJ


2 Answers

Another method using slice. As a bonus, you can combine the mutate and arrange steps:

library(lubridate)
library(dplyr)
library(tidyr)

df %>%
  group_by(id, questions) %>%
  arrange(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  slice(which.max(!is.na(answers))) %>%
  select(-timestamp) %>%
  spread(questions, answers)

Output:

# A tibble: 2 x 4
# Groups:   id [2]
  id      car   dish    house
  <fct>   <fct> <fct>   <fct>
1 123||wa bmw   ravioli yes  
2 223||sa audi  pizza   yes 
like image 92
acylam Avatar answered May 02 '23 20:05

acylam


We can filter out the elements that are " " in 'answers' in the beginning

df %>% 
  filter(answers != " ") %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)
# A tibble: 2 x 4
# Groups:   id [2]
#  id      car   dish    house
#  <fct>   <fct> <fct>   <fct>
#1 123||wa bmw   ravioli yes  
#2 223||sa audi  pizza   yes  
like image 40
akrun Avatar answered May 02 '23 18:05

akrun