Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two dates in R

Tags:

date

equality

r

I have a tab-separated text file that I imported to R. I used the following command for the import:

data = read.table(soubor, header = TRUE, sep = "\t", dec = ".", colClasses =c("numeric","numeric","character","Date","numeric","numeric"))

When I run str(data) to check the data-types of my columns I get:

'data.frame':   211931 obs. of  6 variables:
$ DataValue   : num  0 0 0 0 0 0 0 0 0 NA ...
$ SiteID      : num  1 1 1 1 1 1 1 1 1 1 ...
$ VariableCode: chr  "Sucho" "Sucho" "Sucho" "Sucho" ...
$ DateTimeUTC : Date, format: "2012-07-01" "2012-07-02" "2012-07-03" "2012-07-04" ...
$ Latitude    : num  50.8 50.8 50.8 50.8 50.8 ...
$ Longitude   : num  15.6 15.6 15.6 15.6 15.6 ...

A reproducible sample of the first 20 rows of my data is here:

my_sample = dput(data[1:20,])

structure(list(DataValue = c(0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, 
NA, NA, NA, NA, NA, NA, 0, 0, 0), SiteID = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), VariableCode = c("Sucho", 
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", 
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", 
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho"), DateTimeUTC = structure(c(15522, 
15523, 15524, 15525, 15526, 15527, 15528, 15529, 15530, 15531, 
15532, 15533, 15534, 15535, 15536, 15537, 15538, 15539, 15540, 
15541), class = "Date"), Latitude = c(50.77, 50.77, 50.77, 50.77, 
50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 
50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77), Longitude = c(15.55, 
15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 
15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 
15.55)), .Names = c("DataValue", "SiteID", "VariableCode", "DateTimeUTC", 
"Latitude", "Longitude"), row.names = c(NA, 20L), class = "data.frame")

Now I want to filter my table by the date. Note that I'm running my code inside a for loop. First, I subset my data by 1st July 2012 and do some processing. Then, I subset my data by 2nd July and do some processing, and so on.. For example, I want to get all rows with date equal to 6th July 2012. I tried the code:

startDate = as.Date("2012-07-01");
endDate = as.Date("2012-07-20");
all_dates = seq(startDate, endDate, 1);

#the following code I'm trying to run inside a loop...
for (j in 1:length(all_dates)) {
    filterdate = all_dates[j];
    my_subset = my_sample[my_sample$DateTimeUTC == filterdate,]
    #now I want do do some processing on my_subset...
}

But the above code returns an empty dataset starting from step 7 of the loop.

So, for example:

subset_one = my_sample[my_sample$DateTimeUTC == all_dates[6],]

returns: 3 obs of 6 variables.

But, for some unknown reason, the example:

subset_two = my_sample[my_sample$DateTimeUTC == all_dates[7],]

returns: 0 obs of 6 variables.

(note: I edited the above code to make my problem 100% reproducible)

Any ideas what I'm doing wrong?

like image 345
jirikadlec2 Avatar asked Feb 05 '14 08:02

jirikadlec2


People also ask

What is POSIXct format in R?

POSIXct stores date and time in seconds with the number of seconds beginning at 1 January 1970. Negative numbers are used to store dates prior to 1970. Thus, the POSIXct format stores each date and time a single value in units of seconds. Storing the data this way, optimizes use in data.

How does R recognize dates?

Date objects in RDate objects are stored in R as integer values, allowing for dates to be compared and manipulated as you would a numeric vector. Logical comparisons are a simple. When referring to dates, earlier dates are “less than” later dates.

Can we compare two dates in SQL?

Here we will see, SQL Query to compare two dates. This can be easily done using equals to(=), less than(<), and greater than(>) operators. In SQL, the date value has DATE datatype which accepts date in 'yyyy-mm-dd' format. To compare two dates, we will declare two dates and compare them using the IF-ELSE statement.


1 Answers

The following solution solved my problem: Instead of using the Date data type, I tried to use the POSIXct data type. Here is the example code for reading the tab-separated textfile after which the subsetting worked in all steps of my for loop:

data = read.table("data.txt", header = TRUE, sep = "\t", dec = ".", 
    colClasses =c("numeric","numeric","character","POSIXct","numeric","numeric"));
startDate = as.POSIXct("2012-07-01");
endDate = as.POSIXct("2012-07-20");
all_dates = seq(startDate, endDate, 86400); #86400 is num of seconds in a day

#the following code I'm trying to run inside a loop...
for (j in 1:length(all_dates)) {
    filterdate = all_dates[j];
    my_subset = data[data$DateTimeUTC == filterdate,]
    #now I want do do some processing on my_subset...
}
like image 61
jirikadlec2 Avatar answered Oct 24 '22 08:10

jirikadlec2