Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract hourly max/min/median values with timestamp in R

Tags:

time

dataframe

r

I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.

This is part of my dataframe: (it also contains missing values NA)

 Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`


               <dttm>     <dbl>      <dbl>     <dbl>     <dbl>     <dbl>
 1 2017-08-26 17:00:00      27.5       27.5      27.5      27.0      27.0
 2 2017-08-26 17:10:00      27.5       27.0      27.5      27.0      27.0
 3 2017-08-26 17:20:00      27.5       27.0      27.0      27.0      27.0
 4 2017-08-26 17:30:00      27.0       26.5      27.0      26.5      26.5
 5 2017-08-26 17:40:00      26.5       26.5      26.5      26.5      26.5
 6 2017-08-26 17:50:00      26.5       26.0      26.5      26.0      26.5
 7 2017-08-26 18:00:00      26.5       26.0      26.5      26.5      26.5
 8 2017-08-26 18:10:00      27.0       26.0      26.5      26.5      26.0
 9 2017-08-26 18:20:00      26.5       26.5      26.5      26.5      26.0
10 2017-08-26 18:30:00      26.5       26.5      26.5      26.5      26.0

I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.

Is this possible with R?

I tried already various functions.

group_by allowed me to calculate min/max for each column but without timestamps. period.apply also allowed me to calculate min/max/median but only for one column. Also aggregate() didn't lead to any succes.

I'm learning in R, but didn't came close to a solution on this problem.

This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance

like image 305
M95 Avatar asked Apr 07 '18 12:04

M95


2 Answers

We can use floor_date from the lubridate package to create a new column Time2 to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date or ceiling_date. After that, we can use gather from the tidyr package to convert the data frame from wide format to long format.

library(dplyr)
library(tidyr)
library(lubridate)

dat2 <- dat %>%
  mutate(Time = ymd_hms(Time),
         Time2 = floor_date(Time, unit = "hour")) %>%
  gather(LCZ, Value, starts_with("LCZ")) %>%
  group_by(Time2, LCZ)

After that, we can summarize the data by LCZ and Time2.

dat3 <- dat2 %>%
  summarise(Min = min(Value, na.rm = TRUE),
            Max = max(Value, na.rm = TRUE),
            Median = median(Value, na.rm = TRUE)) %>%
  ungroup()
dat3
# # A tibble: 10 x 5
#    Time2               LCZ        Min   Max Median
#    <dttm>              <chr>    <dbl> <dbl>  <dbl>
#  1 2017-08-26 17:00:00 LCZ.3.10  26.0  27.5   26.8
#  2 2017-08-26 17:00:00 LCZ.3.2   26.5  27.5   27.2
#  3 2017-08-26 17:00:00 LCZ.6.1   26.5  27.5   27.0
#  4 2017-08-26 17:00:00 LCZ.6.9   26.0  27.0   26.8
#  5 2017-08-26 17:00:00 LCZ.9.4   26.5  27.0   26.8
#  6 2017-08-26 18:00:00 LCZ.3.10  26.0  26.5   26.2
#  7 2017-08-26 18:00:00 LCZ.3.2   26.5  27.0   26.5
#  8 2017-08-26 18:00:00 LCZ.6.1   26.5  26.5   26.5
#  9 2017-08-26 18:00:00 LCZ.6.9   26.5  26.5   26.5
# 10 2017-08-26 18:00:00 LCZ.9.4   26.0  26.5   26.0

If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.

dat4 <- dat2 %>%
  mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
         Max = (Value == max(Value, na.rm = TRUE)) + 0L,
         Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
  ungroup()
dat4
# # A tibble: 50 x 7
#    Time                Time2               LCZ     Value   Min   Max Median
#    <dttm>              <dttm>              <chr>   <dbl> <int> <int>  <int>
#  1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2  27.0     0     0      0
#  5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0
#  6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0
#  7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
#  8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2  27.0     0     1      0
#  9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
# # ... with 40 more rows

DATA

dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
                  '2017-08-26 17:00:00'      27.5       27.5      27.5      27.0      27.0
                  '2017-08-26 17:10:00'      27.5       27.0      27.5      27.0      27.0
                  '2017-08-26 17:20:00'      27.5       27.0      27.0      27.0      27.0
                  '2017-08-26 17:30:00'      27.0       26.5      27.0      26.5      26.5
                  '2017-08-26 17:40:00'      26.5       26.5      26.5      26.5      26.5
                  '2017-08-26 17:50:00'      26.5       26.0      26.5      26.0      26.5
                  '2017-08-26 18:00:00'      26.5       26.0      26.5      26.5      26.5
                  '2017-08-26 18:10:00'      27.0       26.0      26.5      26.5      26.0
                  '2017-08-26 18:20:00'      26.5       26.5      26.5      26.5      26.0
                  '2017-08-26 18:30:00'      26.5       26.5      26.5      26.5      26.0",
                  header = TRUE, stringsAsFactors = FALSE)
like image 96
www Avatar answered Nov 09 '22 20:11

www


Here is a way of doing it with dplyr verbs:

library(lubridate)

df %>%
  gather(Location, Temp, -Time) %>%
  group_by(Date = date(Time), HoD = hour(Time), Location) %>%
  mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
  filter(Temp == Min | Temp == Max) %>%
  arrange(Location, Time) %>%
  distinct(Temp, .keep_all = T) %>%
  mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
  dplyr::select(-Temp) %>%
  spread("MinMax", "Time")

Output:

Notice the NA which mean that minimum and maximum temperatures were identical on that day, in that hour and at that location.

# A tibble: 10 x 8
# Groups:   Date, HoD, Location [10]
   Location Date         HoD   Min   Max Median MaxTime             MinTime            
   <chr>    <date>     <int> <dbl> <dbl>  <dbl> <chr>               <chr>              
 1 LCZ.3.10 2017-08-26    17  26.0  27.5   26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
 2 LCZ.3.10 2017-08-26    18  26.0  26.5   26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
 3 LCZ.3.2  2017-08-26    17  26.5  27.5   27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
 4 LCZ.3.2  2017-08-26    18  26.5  27.0   26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
 5 LCZ.6.1  2017-08-26    17  26.5  27.5   27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
 6 LCZ.6.1  2017-08-26    18  26.5  26.5   26.5 NA                  2017-08-26 18:00:00
 7 LCZ.6.9  2017-08-26    17  26.0  27.0   26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
 8 LCZ.6.9  2017-08-26    18  26.5  26.5   26.5 NA                  2017-08-26 18:00:00
 9 LCZ.9.4  2017-08-26    17  26.5  27.0   26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4  2017-08-26    18  26.0  26.5   26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
like image 41
Martin Schmelzer Avatar answered Nov 09 '22 20:11

Martin Schmelzer