Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pick hourly values from dataset?

Tags:

r

dataset

subset

I need help with this issue:

I have a dataset of water level values distributed every 30 minutes, but I need only the hourly values. I tried with the aggregate() function but due to function FUN is one requisite it determines my analysis to be mean, or median and I don't want to use any stat function.

This one example of my data frame

06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996
like image 256
FernRay Avatar asked May 01 '16 15:05

FernRay


People also ask

How to get hourly data from sql query?

Here is the SQL query to get data for every hour in MySQL. In the above query, we simply group by order_date using HOUR function and aggregate amount column using SUM function. HOUR function retrieves hour number from a given date/time/datetime value, which can be provided as a literal string or column name.

How to get hour wise data in MySQL?

HOUR() function : This function in MySQL is used to return the hour part for a specified date. The range of hours value is from 0 to 838. For example, if the specified time is “09:12:23”, this function will return 09 hours.


2 Answers

We convert the 'RefDateTimeRef' column to POSIXct, extract the 'minute', 'second' with format and compare it with 00:00 to return a logical vector which we use to subset the rows.

df1[format(as.POSIXct(df1[,1], format = "%m/%d/%Y %H:%M"), "%M:%S")=="00:00",]
#     RefDateTimeRef  Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148

Or with lubridate

library(lubridate)
df1[ minute(mdy_hm(df1[,1]))==0,]
#     RefDateTimeRef  Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148

Or with sub to remove the substring until the hour part and then use == to get the logical vector and subset the rows.

df1[ sub(".*\\s+\\S{2}:", "", df1[,1])=="00",]

NOTE: I would advice against using sub or substr as it can sometimes lead to incorrect answers.

like image 59
akrun Avatar answered Sep 22 '22 10:09

akrun


df <- read.table(text = '06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996')

colnames(df) <- c('Date','Time','Value')

index <- ifelse(substring(df$Time,4) == "00:00",T,F)

final_df <- df[index,]
like image 44
Kunal Puri Avatar answered Sep 21 '22 10:09

Kunal Puri