Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

substr in dplyr %>% mutate

Tags:

r

dplyr

pcd <- data.frame(tripNo = c(618, 618, 610, 610, 610, 619), 
              procDate = as.Date(c('2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03')),
              delay = c(7.45, 12.90, 11.88, 6.66, 12.50, 9.41) )

I want to flag inconsistencies in trips processed on two different days where the delay for the second day is shorter than the last one on the previous day. I have now done it this way:

pcd %>%
  arrange(tripNo, procDate, delay) %>% 
  group_by(tripNo) %>% 
  mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
         Alert = ifelse(delayErr, '!', '')) %>%
  select(tripNo, procDate, delay, delayErr, Alert)

  tripNo   procDate delay delayErr Alert
   (dbl)     (date) (dbl)    (lgl) (chr)
1    610 2016-03-02 11.88    FALSE      
2    610 2016-03-02 12.50    FALSE      
3    610 2016-03-03  6.66     TRUE     !
4    618 2016-03-02  7.45    FALSE      
5    618 2016-03-03 12.90    FALSE      
6    619 2016-03-03  9.41    FALSE      

So this works OK, my question is about my first attempt, in which I tried to use substr:

pcd %>% arrange(tripNo, procDate, delay) %>% 
group_by(tripNo) %>% 
mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
       Alert = substr(' !', delayErr + 1, delayErr + 1) ) %>%  # <<< This is the only change
select(tripNo, procDate, delay, delayErr, Alert)

  tripNo   procDate delay delayErr Alert
   (dbl)     (date) (dbl)    (lgl) (chr)
1    610 2016-03-02 11.88    FALSE      
2    610 2016-03-02 12.50    FALSE      
3    610 2016-03-03  6.66     TRUE      
4    618 2016-03-02  7.45    FALSE      
5    618 2016-03-03 12.90    FALSE      
6    619 2016-03-03  9.41    FALSE      

With this code, the Alert does not show as I expected. Could someone explain to me why the second dplyr query doesn't work?
Thanks!

like image 625
ap53 Avatar asked Mar 12 '23 14:03

ap53


2 Answers

There is already a vectorized version of substr i.e. substring

pcd %>%
  arrange(tripNo, procDate, delay) %>% 
  group_by(tripNo) %>% 
  mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
         Alert = substring(' !', delayErr +1, delayErr +1)) %>% 
  select(tripNo, procDate, delay, delayErr, Alert)
#   tripNo   procDate delay delayErr Alert
#   (dbl)     (date) (dbl)    (lgl) (chr)
#1    610 2016-03-02 11.88    FALSE      
#2    610 2016-03-02 12.50    FALSE      
#3    610 2016-03-03  6.66     TRUE     !
#4    618 2016-03-02  7.45    FALSE      
#5    618 2016-03-03 12.90    FALSE      
#6    619 2016-03-03  9.41    FALSE      
like image 183
akrun Avatar answered Mar 31 '23 16:03

akrun


It's because substr expects single values as second and third arguments, but you are providing numeric vectors. You could make a vectorized version of substr with

substr2 <- Vectorize(substr)

If you then replace your original function with this new function it should work as expected.

like image 39
Erich Studerus Avatar answered Mar 31 '23 16:03

Erich Studerus