Get latest record but if the latest is empty then get the last latest record

Question

I'm having issue trying to convert rows into columns and then getting only the latest record that has a entry (using timestamp). Here is my data set:

enter image description here

df <- data.frame(id = c("123||wa", "123||wa", "123||wa", "223||sa", "223||sa", "223||sa", "123||wa"),
             questions = c("dish", "car", "house", "dish", "house", "car", "dish"),
             answers = c(" ", "bmw", "yes", "pizza", "yes", "audi","ravioli" ), 
             timestamp = c("24JUN2018:12:24:16", "07JAN2015:15:22:54", "24MAR2018:12:24:16", "24MAR2018:12:24:16",
                           "04AUG2014:12:40:30", "03JUL2014:15:38:11", "03JUL2014:15:38:11"))

The desired output is:

enter image description here

So far I have the below code which does most of the job.

library(lubridate)
library(dplyr)

df %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)

The only thing missing is that it always pulls out the latest record even if it's empty. How can I change it to only include the latest entry record and not to pull blanks. Thanks

The output produced by the above code is:
enter image description here

acylam · Accepted Answer

Another method using slice. As a bonus, you can combine the mutate and arrange steps:

library(lubridate)
library(dplyr)
library(tidyr)

df %>%
  group_by(id, questions) %>%
  arrange(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  slice(which.max(!is.na(answers))) %>%
  select(-timestamp) %>%
  spread(questions, answers)

Output:

# A tibble: 2 x 4
# Groups:   id [2]
  id      car   dish    house
  <fct>   <fct> <fct>   <fct>
1 123||wa bmw   ravioli yes  
2 223||sa audi  pizza   yes

akrun · Answer

We can filter out the elements that are " " in 'answers' in the beginning

df %>% 
  filter(answers != " ") %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)
# A tibble: 2 x 4
# Groups:   id [2]
#  id      car   dish    house
#  <fct>   <fct> <fct>   <fct>
#1 123||wa bmw   ravioli yes  
#2 223||sa audi  pizza   yes

Get latest record but if the latest is empty then get the last latest record

Tags:

r

dplyr

SAJ

2 Answers

acylam

akrun

Recent Activity

Donate For Us

Get latest record but if the latest is empty then get the last latest record

Tags:

r

dplyr

SAJ

2 Answers

acylam

akrun

Related questions

Recent Activity

Donate For Us