Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow dplyr query in R

Tags:

r

dplyr

I've got a bit of code in R:

library(dplyr)

df_temp <- df %>%
   group_by(policy_number, policy_year) %>% 
   summarise(term_start_date  = last(term_start_date),
             term_end_date    = last(term_end_date),
             on_cover_after   = last(on_cover_after),
             termination_code = last(termination_code),
             termination_date = last(termination_date))

The main table df is about 700,000 rows by 130 columns. Grouped by policy_number and policy_year there are about 300,000 (policy_number/policy_year) groupings.

4 of the 5 columns that I've referred to in last() are dates.

This query takes about 3 minutes to run, which is a nuisance because the rest of my code runs quite briskly. I'm hoping to speed it up. Is there anything I could try that might help please?

(ideally would supply a reprex but how could I do that here? not sure)

Thank you.

Edit: since I'm always using the last record for a given (policy_number/policy_year) pair, is there some code I could write along the lines of:

df_temp <- df %>%
   group_by(policy_number, policy_year) %>% 
   mutate(counter = 1:n()) %>%
   filter(counter == max(counter)) %>%
   select(term_start_date,
          term_end_date,
          on_cover_after,
          termination_code,
          termination_date)

?

like image 939
Alan Avatar asked Apr 02 '20 20:04

Alan


2 Answers

There is a great source here about this. The author makes several great suggestions (see his comments section). I would consider aggregating your data with data.table, or if you stick with dplyr then consider defining a key. Some metrics of relative benchmarks:

enter image description here

From source

like image 148
DSH Avatar answered Sep 21 '22 14:09

DSH


Instead of summarise, use summarise_at

library(dplyr)
df %>%
   group_by(policy_number, policy_year) %>%
   summarise_at(vars(term_start_date, term_end_date,
       term_end_date,termination_code, termination_date), last)
like image 20
akrun Avatar answered Sep 23 '22 14:09

akrun