Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Why is class Date lost upon subsetting

Tags:

r

Here is an easy example. I have a a data frame with three dates in it:

Data <- as.data.frame(as.Date(c('1970/01/01', '1970/01/02', '1970/01/03')))
names(Data) <- "date"

Now I add a column consisting of the same entries:

for(i in 1:3){
  Data[i, "date2"] <- Data[i, "date"]
}

Output looks like this:

        date date2
1 1970-01-01     0
2 1970-01-02     1
3 1970-01-03     2

For unknown reasons the class of column date2 is numeric instead of date which was the class of date. Curiously, if you tell R explicitly to use the Date format:

for(i in 1:3){
  Data[i, "date3"] <- as.Date(Data[i, "date"])
}

it doesn't make any difference.

        date date2 date3
1 1970-01-01     0     0
2 1970-01-02     1     1
3 1970-01-03     2     2

The problem seems to be in the use of subsetting [], in more interesting examples where you have two columns of dates and want to create a third one that picks a date from one of the two other columns depending on some factor the same happens.

Of course we can fix everything in retrospect by doing something like:

Data$date4 <- as.Date(Data$date2, origin = "1970-01-01")

but I'm still wondering: why? Why is this happening? Why can't my dates just stay dates when being transferred to another column??

like image 321
Vincent Avatar asked Jul 01 '13 14:07

Vincent


People also ask

How does R recognize dates?

Date objects in R Date objects are stored in R as integer values, allowing for dates to be compared and manipulated as you would a numeric vector. Logical comparisons are a simple. When referring to dates, earlier dates are “less than” later dates.

What class should a date be in R?

Date values can be represented in tables as numbers or characters. But to be properly interpreted by R as dates, date values should be converted to an R date object class or a POSIXct / POSIXt object class.

How do I remove a column from a time series in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.


1 Answers

This is not a final solution, but I think that can help to understand.

Here your data :

Data <- data.frame(date = 
                  as.Date(c('2000/01/01', '2012/01/02', '2013/01/03')))

Take this 2 vectors , one typed by default as numeric and the second as Date.

vv <- vector("numeric",3)
vv.Date <- vector("numeric",3)
class(vv.Date) <- 'Date'
vv
[1] 0 0 0
> vv.Date
[1] "1970-01-01" "1970-01-01" "1970-01-01" ## type dates is initialized by the origin 01-01-1970

Now if I try to assign the first element of each vector as you do in the first step of your loop:

vv[1] <- Data$date[1]
vv.Date[1] <- Data$date[1]
vv
[1] 10957     0     0
> vv.Date
[1] "2000-01-01" "1970-01-01" "1970-01-01"  

As you see the typed vector is well created. What happen, when you assign a vector by a scalar value , R try internally to convert it to the type of the vector. To return to your example, When you do this :

You a creating a numeric vector (vv), and you try to assign dates to it:

for(i in 1:3){
  Data[i, "date3"] <- as.Date(Data[i, "date"])
}

If you type your date3 , for example:

Data$date3 <- vv.Date

then you try again

for(i in 1:3){
  Data[i, "date3"] <- as.Date(Data[i, "date"])
}

You will get a good result:

       date      date3
1 2000-01-01 2000-01-01
2 2012-01-02 2012-01-02
3 2013-01-03 2013-01-03
like image 167
agstudy Avatar answered Sep 19 '22 05:09

agstudy