I have a tab-separated text file that I imported to R. I used the following command for the import:
data = read.table(soubor, header = TRUE, sep = "\t", dec = ".", colClasses =c("numeric","numeric","character","Date","numeric","numeric"))
When I run str(data)
to check the data-types of my columns I get:
'data.frame': 211931 obs. of 6 variables:
$ DataValue : num 0 0 0 0 0 0 0 0 0 NA ...
$ SiteID : num 1 1 1 1 1 1 1 1 1 1 ...
$ VariableCode: chr "Sucho" "Sucho" "Sucho" "Sucho" ...
$ DateTimeUTC : Date, format: "2012-07-01" "2012-07-02" "2012-07-03" "2012-07-04" ...
$ Latitude : num 50.8 50.8 50.8 50.8 50.8 ...
$ Longitude : num 15.6 15.6 15.6 15.6 15.6 ...
A reproducible sample of the first 20 rows of my data is here:
my_sample = dput(data[1:20,])
structure(list(DataValue = c(0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA,
NA, NA, NA, NA, NA, NA, 0, 0, 0), SiteID = c(1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), VariableCode = c("Sucho",
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho",
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho", "Sucho",
"Sucho", "Sucho", "Sucho", "Sucho", "Sucho"), DateTimeUTC = structure(c(15522,
15523, 15524, 15525, 15526, 15527, 15528, 15529, 15530, 15531,
15532, 15533, 15534, 15535, 15536, 15537, 15538, 15539, 15540,
15541), class = "Date"), Latitude = c(50.77, 50.77, 50.77, 50.77,
50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77,
50.77, 50.77, 50.77, 50.77, 50.77, 50.77, 50.77), Longitude = c(15.55,
15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55,
15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55, 15.55,
15.55)), .Names = c("DataValue", "SiteID", "VariableCode", "DateTimeUTC",
"Latitude", "Longitude"), row.names = c(NA, 20L), class = "data.frame")
Now I want to filter my table by the date. Note that I'm running my code inside a for
loop. First, I subset my data by 1st July 2012 and do some processing. Then, I subset my data by 2nd July and do some processing, and so on.. For example, I want to get all rows with date equal to 6th July 2012. I tried the code:
startDate = as.Date("2012-07-01");
endDate = as.Date("2012-07-20");
all_dates = seq(startDate, endDate, 1);
#the following code I'm trying to run inside a loop...
for (j in 1:length(all_dates)) {
filterdate = all_dates[j];
my_subset = my_sample[my_sample$DateTimeUTC == filterdate,]
#now I want do do some processing on my_subset...
}
But the above code returns an empty dataset starting from step 7 of the loop.
So, for example:
subset_one = my_sample[my_sample$DateTimeUTC == all_dates[6],]
returns: 3 obs of 6 variables
.
But, for some unknown reason, the example:
subset_two = my_sample[my_sample$DateTimeUTC == all_dates[7],]
returns: 0 obs of 6 variables
.
(note: I edited the above code to make my problem 100% reproducible)
Any ideas what I'm doing wrong?
POSIXct stores date and time in seconds with the number of seconds beginning at 1 January 1970. Negative numbers are used to store dates prior to 1970. Thus, the POSIXct format stores each date and time a single value in units of seconds. Storing the data this way, optimizes use in data.
Date objects in RDate objects are stored in R as integer values, allowing for dates to be compared and manipulated as you would a numeric vector. Logical comparisons are a simple. When referring to dates, earlier dates are “less than” later dates.
Here we will see, SQL Query to compare two dates. This can be easily done using equals to(=), less than(<), and greater than(>) operators. In SQL, the date value has DATE datatype which accepts date in 'yyyy-mm-dd' format. To compare two dates, we will declare two dates and compare them using the IF-ELSE statement.
The following solution solved my problem:
Instead of using the Date
data type, I tried to use the POSIXct
data type.
Here is the example code for reading the tab-separated textfile after which the subsetting worked in all steps of my for
loop:
data = read.table("data.txt", header = TRUE, sep = "\t", dec = ".",
colClasses =c("numeric","numeric","character","POSIXct","numeric","numeric"));
startDate = as.POSIXct("2012-07-01");
endDate = as.POSIXct("2012-07-20");
all_dates = seq(startDate, endDate, 86400); #86400 is num of seconds in a day
#the following code I'm trying to run inside a loop...
for (j in 1:length(all_dates)) {
filterdate = all_dates[j];
my_subset = data[data$DateTimeUTC == filterdate,]
#now I want do do some processing on my_subset...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With