Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If function in dplyr::mutate : the condition has length > 1

Tags:

r

dplyr

a lot of people seem to have this issue however I was not able to find a satisfying answer. If you indulge me, I would like to be sure to understand what's happening

I'm having dates of various format in a dataframe (also a common issue) so i have built a small function to handle it for me:

dateHandler <- function(inputString){
  if(grepl("-",inputString)==T){
    lubridate::dmy(inputString, tz="GMT")
  }else{
    as.POSIXct(as.numeric(inputString)*60*60*24, origin="1899-12-30", tz="GMT")
  }
}

When using it on one element it works fine:

myExample <-c("18-Mar-11","42433")

> dateHandler(myExample[1])
[1] "2011-03-18 GMT"
> dateHandler(myExample[2])
[1] "2016-03-04 GMT"

However when using it on a whole column it does not work:

myDf <- as.data.frame(myExample)
> myDf <- myDf %>% 
+   dplyr::mutate(dateClean=dateHandler(myExample))
Warning messages:
1: In if (grepl("-", inputString) == T) { :
  the condition has length > 1 and only the first element will be used
2:  1 failed to parse. 

From reading on the forum, my current understanding is that R passes a vector with all the elements of myDf$myExample to the function, which is not built to handle vector of length >1. If that is correct, the next step is to understand what to do from there. Many people recommend using ifelse rather than if but I do not understand how this would help me. Also I read that ifelse returns something of the same format as its input, which does not work for me in that case.

Thank you in advance for answering this question for the 10000th time.

Nicolas

like image 960
naro Avatar asked Nov 20 '25 00:11

naro


1 Answers

You have two option on where to go from there. One is to apply your current function to a list using lapply. As in:

myDf$dateClean <- lapply(myDf$myExample, function(x) dateHandler(x))

The other option is to build a vectorized function that is designed to take a vector as an input rather than a single data point. Here is a simple example:

dateHandlerVectorized <- function(inputVector){

  output <- rep(as.POSIXct("1/1/11"), length(inputVector))
  UseLuridate <- grepl("-", inputVector)
  output[UseLuridate] <- lubridate::dmy(inputVector[UseLuridate], tz="GMT")
  output[!UseLuridate] <- as.POSIXct(as.numeric(inputVector[!UseLuridate])*60*60*24, origin="1899-12-30", tz="GMT")
  output

}

myDf <- myDf %>% dplyr::mutate(dateClean=dateHandlerVectorized(myDf$myExample))
like image 73
Ian Wesley Avatar answered Nov 21 '25 14:11

Ian Wesley