Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ifelse with data.table

Here is my data:

BuyDate       SellDate     Number
2015-01-01    NA           1
2015-01-01    2015-01-03   1
2015-01-01    2015-01-03   -1
2016-12-09    NA           -1

I want to create a new column Start, so I can have the following result.

BuyDate       SellDate     Number    Start
2015-01-01    NA           1         2015-01-01
2015-01-01    2015-01-03   1         2015-01-01
2015-01-01    2015-01-03   -1        2015-01-03
2016-12-09    NA           -1        2016-12-09

The code is:

data[,Start:=ifelse(Number=="1",BuyDate,ifelse(is.na(SellDate),BuyDate,SellDate))]

However, I get:

BuyDate       SellDate     Number    Start
2015-01-01    NA           1         1420070400
2015-01-01    2015-01-03   1         1420070400
2015-01-01    2015-01-03   -1        1420243200
2016-12-09    NA           -1        1481241600

How can I solve this?

str(data)
Classes ‘data.table’ and 'data.frame':
 $BuyDate : POSIXct, format: "2015-01-01" "2015-01-01" "2015-01-01" "2016-12-09"
 $SellDate: POSIXct, format: NA "2015-01-03" "2015-01-03" NA
 $Number  : chr  "1" "1" "-1" "-1"
like image 838
Peter Chen Avatar asked May 03 '17 08:05

Peter Chen


2 Answers

It is better not to use ifelse as the 'Date' can get coerced to integer storage values, instead, we assign (:=) 'Start' as the 'SellDate', then specify the logical condition in 'i' for identifying the 'NA' elements in 'Start' or the 1 in 'Number' and assign (:=) the elements in 'BuyDate' that correspond to 'i' to 'Start'

data[, Start := SellDate][Number==1, Start := BuyDate
          ][is.na(Start), Start := BuyDate][]

As @Cath mentioned in the comments, this can be done in two steps

data[, Start := SellDate][(Number==1) | is.na(Start), Start := BuyDate][]
like image 68
akrun Avatar answered Nov 04 '22 15:11

akrun


The Start variable has to be converted to POSIXct:

require(dplyr)
data[, Start:= (ifelse(Number=="1",BuyDate,ifelse(is.na(SellDate),BuyDate,SellDate)) %>% 
         as.POSIXct(origin = "1970-01-01"))]

ADDED:

Following codes run with dplyr. I am not sure why dplyr won't work with the example above.

library(dplyr)
library(data.table)

dates <- as.POSIXct(Sys.Date() + 1:20)
dates2 <- as.POSIXct(Sys.Date() + 21:40)

tmp <- data.table(date = dates, date2 = dates2)
tmp[runif(20)>.8, date2 := NA]
tmp[, date3 := (ifelse(is.na(date2), date, date2) %>% as.POSIXct(origin = "1970-01-01"))]
like image 43
amatsuo_net Avatar answered Nov 04 '22 15:11

amatsuo_net