Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the highest (latest) and lowest (earliest) date [R]

Tags:

datetime

time

r

I’m attempting to transform two columns in my dataframe to the ‘good’ date & time class, and until now didn’t have much success with it. I’ve tried various classes (timeDate, Date, timeSeries, POSIXct, POSIXlt) but without success. Perhaps I’m just overlooking the obvious and because I’ve tried so many approaches I just don’t know what’s what anymore. I hope some of you can shed some light on where I go wrong.

Goal: I want to calculate the difference between two dates using the earliest and latest date. I got this working with head() and tail(), but because those values aren’t necessary the earliest and latest date in my data, I need another way. (I can’t get the sorting of data to work, because it sorts the data only on the day of the date.)

Second goal: I want to convert the dates from daily format (i.e. 8-12-2010) to weekly, monthly, and yearly levels (i.e. '49-2010', 'december-10', and just '2010'). This can be done with the format settings (like %d-%m-%y). Can this be done with converting the data.frame to an time class, and than transforming the timeclass in the right format (8-12-2010 -> format("%B-%y") -> 'december-10'), and then transforming that time class into an factor with levels for each month?

For both goals I need to convert the dateframe in some way to an time class, and this is where I ran into some difficulties.

My dataframe looks like this:

> tradesList[c(1,10,11,20),14:15] -> tmpTimes4
> tmpTimes4
   EntryTime ExitTime
1   01-03-07 10-04-07
10  29-10-07 02-11-07
11  13-04-07 14-05-07
20  18-12-07 20-02-08

Here’s an summary of what I’ve tried:

> class(tmpTimes4)
[1] "data.frame"
> as.Date(head(tmpTimes4$EntryTimes, n=1), format="%d-%m-%y")
Error in as.Date.default(head(tmpTimes4$EntryTimes, n = 1), format = "%d-%m-%y") : 
  do not know how to convert 'head(tmpTimes4$EntryTimes, n = 1)' to class "Date"
> as.timeDate(tmpTimes4, format="%d-%m-%y")
Error in as.timeDate(tmpTimes4, format = "%d-%m-%y") : 
  unused argument(s) (format = "%d-%m-%y")
> timeSeries(tmpTimes4, format="%d-%m-%y")
Error in midnightStandard2(charvec, format) : 
  'charvec' has non-NA entries of different number of characters
> tmpEntryTimes4 <- timeSeries(tmpTimes4$EntryTime, format="%d-%m-%y")
> tmpExitTimes4 <- timeSeries(tmpTimes4$ExitTime, format="%d-%m-%y")
> tmpTimes5 <- cbind(tmpEntryTimes4,tmpExitTimes4)
> colnames(tmpTimes5) <- c("Entry","Exit")
> tmpTimes5
     Entry    Exit    
[1,] 01-03-07 10-04-07
[2,] 29-10-07 02-11-07
[3,] 13-04-07 14-05-07
[4,] 18-12-07 20-02-08
> class(tmpTimes5)
[1] "timeSeries"
attr(,"package")
[1] "timeSeries"
> as.timeDate(tmpTimes5, format="%d-%m-%y")
Error in as.timeDate(tmpTimes5, format = "%d-%m-%y") : 
  unused argument(s) (format = "%d-%m-%y")
> as.Date(tmpTimes5, format="%d-%m-%y")
Error in as.Date.default(tmpTimes5, format = "%d-%m-%y") : 
  do not know how to convert 'tmpTimes5' to class "Date"
> format.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in format.POSIXlt(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) : 
  wrong class
> as.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) : 
  do not know how to convert 'tmpTimes5' to class "POSIXlt"
> as.POSIXct(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(x, tz, ...) : 
  do not know how to convert 'x' to class "POSIXlt"

The TimeDate packages has an function for ‘range’, however, converting to the Date class works for an individual instance, but for some reason not for an data frame:

> as.Date(tmpTimes4[1,1], format="%d-%m-%y")
[1] "2007-03-01"
> as.Date(tmpTimes4, format="%d-%m-%y")
Error in as.Date.default(tmpTimes4, format = "%d-%m-%y") : 
  do not know how to convert 'tmpTimes4' to class "Date"

At this point I almost believe it’s impossible to do, so any thoughts would be highly appreciated!

Regards,

like image 661
Jos Avatar asked Dec 08 '10 09:12

Jos


People also ask

How do you arrange dates in ascending order in R?

Here order() function is used to sort the dataframe by R using order() function based on the date column, we have to convert the date column to date with the format, this will sort in ascending order.

How do I view dates in R?

To get the current system date, we can use the Sys. Date() function. Sys.


2 Answers

Start with some dummy data:

start <- as.Date("2010/01/01")
end <- as.Date("2010/12/31")
set.seed(1)
datewant <- seq(start, end, by = "days")[sample(15)]
tmpTimes <- data.frame(EntryTime = datewant, 
                       ExitTime = datewant + sample(100, 15))
## reorder on EntryTime so in random order
tmpTimes <- tmpTimes[sample(NROW(tmpTimes)), ]
head(tmpTimes)

so we have something like this:

> head(tmpTimes)
    EntryTime   ExitTime
8  2010-01-14 2010-03-16
9  2010-01-05 2010-01-17
7  2010-01-10 2010-01-30
3  2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15

Using the above, look at Goal 1, compute difference between earliest and latest date. You can treat dates as if they were numbers (that is how they are stored internally anyway), so functions like min() and max() will work. You can use the difftime() function:

> with(tmpTimes, difftime(max(EntryTime), min(EntryTime)))
Time difference of 14 days

or use standard subtraction

> with(tmpTimes, max(EntryTime) - min(EntryTime))
Time difference of 14 days

to get the difference in days. head() and tail() will only work if you sort the dates as these take the first and the last value in a vector, not the highest and lowest actual value.

Goal 2: You seem to be trying to convert a data frame to a Date. You can't do this. What you can do is reformat the data in the components of the data frame. Here I add columns to tmpTimes by reformatting the EntryTime column into several different summaries of the date.

tmpTimes2 <- within(tmpTimes, weekOfYear <- format(EntryTime, format = "%W-%Y"))
tmpTimes2 <- within(tmpTimes2, monthYear <- format(EntryTime, format = "%B-%Y"))
tmpTimes2 <- within(tmpTimes2, Year <- format(EntryTime, format = "%Y"))

Giving:

> head(tmpTimes2)
    EntryTime   ExitTime weekOfYear    monthYear Year
8  2010-01-14 2010-03-16    02-2010 January-2010 2010
9  2010-01-05 2010-01-17    01-2010 January-2010 2010
7  2010-01-10 2010-01-30    01-2010 January-2010 2010
3  2010-01-08 2010-04-16    01-2010 January-2010 2010
10 2010-01-01 2010-01-26    00-2010 January-2010 2010
13 2010-01-12 2010-02-15    02-2010 January-2010 2010

If you are American or want to use the US convention for the start of the week (%W starts the week on a Monday, in US convention is to start on a Sunday), change the %W to %U. ?strftime has more details of what %W and %U represent.


A final point on data format: In the above I have worked with dates in standard R format. You have your data stored in a data frame in a non-standard markup, presumably as characters or factors. So you have something like:

tmpTimes3 <- within(tmpTimes, 
                    EntryTime <- format(EntryTime, format = "%d-%m-%y"))
tmpTimes3 <- within(tmpTimes3, 
                    ExitTime <- format(ExitTime, format = "%d-%m-%y"))

> head(tmpTimes3)
   EntryTime ExitTime
8   14-01-10 16-03-10
9   05-01-10 17-01-10
7   10-01-10 30-01-10
3   08-01-10 16-04-10
10  01-01-10 26-01-10
13  12-01-10 15-02-10

You need to convert those characters or factors to something R understands as a date. My preference would be the "Date" class. Before you try the above answers with your data, convert your data to the correct format:

tmpTimes3 <- 
    within(tmpTimes3, {
           EntryTime <- as.Date(as.character(EntryTime), format = "%d-%m-%y")
           ExitTime <- as.Date(as.character(ExitTime), format = "%d-%m-%y")
           })

so that your data looks like this:

> head(tmpTimes3)
    EntryTime   ExitTime
8  2010-01-14 2010-03-16
9  2010-01-05 2010-01-17
7  2010-01-10 2010-01-30
3  2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15
> str(tmpTimes3)
'data.frame':   15 obs. of  2 variables:
 $ EntryTime:Class 'Date'  num [1:15] 14623 14614 14619 14617 14610 ...
 $ ExitTime :Class 'Date'  num [1:15] 14684 14626 14639 14715 14635 ...
like image 100
Gavin Simpson Avatar answered Oct 28 '22 22:10

Gavin Simpson


Short answer:

  • Convert to date if not already done.
  • Then use min and max on the list of dates.

    date_list = structure(c(15401, 15405, 15405), class = "Date")
    date_list
    #[1] "2012-03-02" "2012-03-06" "2012-03-06"
    
    min(date_list)
    #[1] "2012-03-02"
    max(date_list)
    #[1] "2012-03-06"
    
like image 44
Timothée HENRY Avatar answered Oct 28 '22 21:10

Timothée HENRY