I’m attempting to transform two columns in my dataframe to the ‘good’ date & time class, and until now didn’t have much success with it. I’ve tried various classes (timeDate
, Date
, timeSeries
, POSIXct
, POSIXlt
) but without success. Perhaps I’m just overlooking the obvious and because I’ve tried so many approaches I just don’t know what’s what anymore. I hope some of you can shed some light on where I go wrong.
Goal:
I want to calculate the difference between two dates using the earliest and latest date. I got this working with head()
and tail()
, but because those values aren’t necessary the earliest and latest date in my data, I need another way. (I can’t get the sorting of data to work, because it sorts the data only on the day of the date.)
Second goal: I want to convert the dates from daily format (i.e. 8-12-2010) to weekly, monthly, and yearly levels (i.e. '49-2010', 'december-10', and just '2010'). This can be done with the format settings (like %d-%m-%y
). Can this be done with converting the data.frame to an time class, and than transforming the timeclass in the right format (8-12-2010 -> format("%B-%y") -> 'december-10'
), and then transforming that time class into an factor with levels for each month?
For both goals I need to convert the dateframe in some way to an time class, and this is where I ran into some difficulties.
My dataframe looks like this:
> tradesList[c(1,10,11,20),14:15] -> tmpTimes4
> tmpTimes4
EntryTime ExitTime
1 01-03-07 10-04-07
10 29-10-07 02-11-07
11 13-04-07 14-05-07
20 18-12-07 20-02-08
Here’s an summary of what I’ve tried:
> class(tmpTimes4)
[1] "data.frame"
> as.Date(head(tmpTimes4$EntryTimes, n=1), format="%d-%m-%y")
Error in as.Date.default(head(tmpTimes4$EntryTimes, n = 1), format = "%d-%m-%y") :
do not know how to convert 'head(tmpTimes4$EntryTimes, n = 1)' to class "Date"
> as.timeDate(tmpTimes4, format="%d-%m-%y")
Error in as.timeDate(tmpTimes4, format = "%d-%m-%y") :
unused argument(s) (format = "%d-%m-%y")
> timeSeries(tmpTimes4, format="%d-%m-%y")
Error in midnightStandard2(charvec, format) :
'charvec' has non-NA entries of different number of characters
> tmpEntryTimes4 <- timeSeries(tmpTimes4$EntryTime, format="%d-%m-%y")
> tmpExitTimes4 <- timeSeries(tmpTimes4$ExitTime, format="%d-%m-%y")
> tmpTimes5 <- cbind(tmpEntryTimes4,tmpExitTimes4)
> colnames(tmpTimes5) <- c("Entry","Exit")
> tmpTimes5
Entry Exit
[1,] 01-03-07 10-04-07
[2,] 29-10-07 02-11-07
[3,] 13-04-07 14-05-07
[4,] 18-12-07 20-02-08
> class(tmpTimes5)
[1] "timeSeries"
attr(,"package")
[1] "timeSeries"
> as.timeDate(tmpTimes5, format="%d-%m-%y")
Error in as.timeDate(tmpTimes5, format = "%d-%m-%y") :
unused argument(s) (format = "%d-%m-%y")
> as.Date(tmpTimes5, format="%d-%m-%y")
Error in as.Date.default(tmpTimes5, format = "%d-%m-%y") :
do not know how to convert 'tmpTimes5' to class "Date"
> format.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in format.POSIXlt(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) :
wrong class
> as.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) :
do not know how to convert 'tmpTimes5' to class "POSIXlt"
> as.POSIXct(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(x, tz, ...) :
do not know how to convert 'x' to class "POSIXlt"
The TimeDate packages has an function for ‘range’, however, converting to the Date class works for an individual instance, but for some reason not for an data frame:
> as.Date(tmpTimes4[1,1], format="%d-%m-%y")
[1] "2007-03-01"
> as.Date(tmpTimes4, format="%d-%m-%y")
Error in as.Date.default(tmpTimes4, format = "%d-%m-%y") :
do not know how to convert 'tmpTimes4' to class "Date"
At this point I almost believe it’s impossible to do, so any thoughts would be highly appreciated!
Regards,
Here order() function is used to sort the dataframe by R using order() function based on the date column, we have to convert the date column to date with the format, this will sort in ascending order.
To get the current system date, we can use the Sys. Date() function. Sys.
Start with some dummy data:
start <- as.Date("2010/01/01")
end <- as.Date("2010/12/31")
set.seed(1)
datewant <- seq(start, end, by = "days")[sample(15)]
tmpTimes <- data.frame(EntryTime = datewant,
ExitTime = datewant + sample(100, 15))
## reorder on EntryTime so in random order
tmpTimes <- tmpTimes[sample(NROW(tmpTimes)), ]
head(tmpTimes)
so we have something like this:
> head(tmpTimes)
EntryTime ExitTime
8 2010-01-14 2010-03-16
9 2010-01-05 2010-01-17
7 2010-01-10 2010-01-30
3 2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15
Using the above, look at Goal 1, compute difference between earliest and latest date. You can treat dates as if they were numbers (that is how they are stored internally anyway), so functions like min()
and max()
will work. You can use the difftime()
function:
> with(tmpTimes, difftime(max(EntryTime), min(EntryTime)))
Time difference of 14 days
or use standard subtraction
> with(tmpTimes, max(EntryTime) - min(EntryTime))
Time difference of 14 days
to get the difference in days. head()
and tail()
will only work if you sort the dates as these take the first and the last value in a vector, not the highest and lowest actual value.
Goal 2: You seem to be trying to convert a data frame to a Date. You can't do this. What you can do is reformat the data in the components of the data frame. Here I add columns to tmpTimes
by reformatting the EntryTime
column into several different summaries of the date.
tmpTimes2 <- within(tmpTimes, weekOfYear <- format(EntryTime, format = "%W-%Y"))
tmpTimes2 <- within(tmpTimes2, monthYear <- format(EntryTime, format = "%B-%Y"))
tmpTimes2 <- within(tmpTimes2, Year <- format(EntryTime, format = "%Y"))
Giving:
> head(tmpTimes2)
EntryTime ExitTime weekOfYear monthYear Year
8 2010-01-14 2010-03-16 02-2010 January-2010 2010
9 2010-01-05 2010-01-17 01-2010 January-2010 2010
7 2010-01-10 2010-01-30 01-2010 January-2010 2010
3 2010-01-08 2010-04-16 01-2010 January-2010 2010
10 2010-01-01 2010-01-26 00-2010 January-2010 2010
13 2010-01-12 2010-02-15 02-2010 January-2010 2010
If you are American or want to use the US convention for the start of the week (%W
starts the week on a Monday, in US convention is to start on a Sunday), change the %W
to %U
. ?strftime
has more details of what %W
and %U
represent.
A final point on data format: In the above I have worked with dates in standard R format. You have your data stored in a data frame in a non-standard markup, presumably as characters or factors. So you have something like:
tmpTimes3 <- within(tmpTimes,
EntryTime <- format(EntryTime, format = "%d-%m-%y"))
tmpTimes3 <- within(tmpTimes3,
ExitTime <- format(ExitTime, format = "%d-%m-%y"))
> head(tmpTimes3)
EntryTime ExitTime
8 14-01-10 16-03-10
9 05-01-10 17-01-10
7 10-01-10 30-01-10
3 08-01-10 16-04-10
10 01-01-10 26-01-10
13 12-01-10 15-02-10
You need to convert those characters or factors to something R understands as a date. My preference would be the "Date"
class. Before you try the above answers with your data, convert your data to the correct format:
tmpTimes3 <-
within(tmpTimes3, {
EntryTime <- as.Date(as.character(EntryTime), format = "%d-%m-%y")
ExitTime <- as.Date(as.character(ExitTime), format = "%d-%m-%y")
})
so that your data looks like this:
> head(tmpTimes3)
EntryTime ExitTime
8 2010-01-14 2010-03-16
9 2010-01-05 2010-01-17
7 2010-01-10 2010-01-30
3 2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15
> str(tmpTimes3)
'data.frame': 15 obs. of 2 variables:
$ EntryTime:Class 'Date' num [1:15] 14623 14614 14619 14617 14610 ...
$ ExitTime :Class 'Date' num [1:15] 14684 14626 14639 14715 14635 ...
Short answer:
Then use min and max on the list of dates.
date_list = structure(c(15401, 15405, 15405), class = "Date")
date_list
#[1] "2012-03-02" "2012-03-06" "2012-03-06"
min(date_list)
#[1] "2012-03-02"
max(date_list)
#[1] "2012-03-06"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With