I need help with this issue:
I have a dataset of water level values distributed every 30 minutes, but I need only the hourly values. I tried with the aggregate()
function but due to function FUN
is one requisite it determines my analysis to be mean, or median and I don't want to use any stat function.
This one example of my data frame
06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996
Here is the SQL query to get data for every hour in MySQL. In the above query, we simply group by order_date using HOUR function and aggregate amount column using SUM function. HOUR function retrieves hour number from a given date/time/datetime value, which can be provided as a literal string or column name.
HOUR() function : This function in MySQL is used to return the hour part for a specified date. The range of hours value is from 0 to 838. For example, if the specified time is “09:12:23”, this function will return 09 hours.
We convert the 'RefDateTimeRef' column to POSIXct
, extract the 'minute', 'second' with format
and compare it with 00:00
to return a logical vector which we use to subset the rows.
df1[format(as.POSIXct(df1[,1], format = "%m/%d/%Y %H:%M"), "%M:%S")=="00:00",]
# RefDateTimeRef Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148
Or with lubridate
library(lubridate)
df1[ minute(mdy_hm(df1[,1]))==0,]
# RefDateTimeRef Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148
Or with sub
to remove the substring until the hour part and then use ==
to get the logical vector and subset the rows.
df1[ sub(".*\\s+\\S{2}:", "", df1[,1])=="00",]
NOTE: I would advice against using sub
or substr
as it can sometimes lead to incorrect answers.
df <- read.table(text = '06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996')
colnames(df) <- c('Date','Time','Value')
index <- ifelse(substring(df$Time,4) == "00:00",T,F)
final_df <- df[index,]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With