I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.
This is part of my dataframe: (it also contains missing values NA)
Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-08-26 17:00:00 27.5 27.5 27.5 27.0 27.0
2 2017-08-26 17:10:00 27.5 27.0 27.5 27.0 27.0
3 2017-08-26 17:20:00 27.5 27.0 27.0 27.0 27.0
4 2017-08-26 17:30:00 27.0 26.5 27.0 26.5 26.5
5 2017-08-26 17:40:00 26.5 26.5 26.5 26.5 26.5
6 2017-08-26 17:50:00 26.5 26.0 26.5 26.0 26.5
7 2017-08-26 18:00:00 26.5 26.0 26.5 26.5 26.5
8 2017-08-26 18:10:00 27.0 26.0 26.5 26.5 26.0
9 2017-08-26 18:20:00 26.5 26.5 26.5 26.5 26.0
10 2017-08-26 18:30:00 26.5 26.5 26.5 26.5 26.0
I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.
Is this possible with R?
I tried already various functions.
group_by
allowed me to calculate min/max for each column but without timestamps. period.apply
also allowed me to calculate min/max/median but only for one column.
Also aggregate()
didn't lead to any succes.
I'm learning in R, but didn't came close to a solution on this problem.
This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance
We can use floor_date
from the lubridate package to create a new column Time2
to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date
or ceiling_date
. After that, we can use gather
from the tidyr package to convert the data frame from wide format to long format.
library(dplyr)
library(tidyr)
library(lubridate)
dat2 <- dat %>%
mutate(Time = ymd_hms(Time),
Time2 = floor_date(Time, unit = "hour")) %>%
gather(LCZ, Value, starts_with("LCZ")) %>%
group_by(Time2, LCZ)
After that, we can summarize the data by LCZ
and Time2
.
dat3 <- dat2 %>%
summarise(Min = min(Value, na.rm = TRUE),
Max = max(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE)) %>%
ungroup()
dat3
# # A tibble: 10 x 5
# Time2 LCZ Min Max Median
# <dttm> <chr> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 LCZ.3.10 26.0 27.5 26.8
# 2 2017-08-26 17:00:00 LCZ.3.2 26.5 27.5 27.2
# 3 2017-08-26 17:00:00 LCZ.6.1 26.5 27.5 27.0
# 4 2017-08-26 17:00:00 LCZ.6.9 26.0 27.0 26.8
# 5 2017-08-26 17:00:00 LCZ.9.4 26.5 27.0 26.8
# 6 2017-08-26 18:00:00 LCZ.3.10 26.0 26.5 26.2
# 7 2017-08-26 18:00:00 LCZ.3.2 26.5 27.0 26.5
# 8 2017-08-26 18:00:00 LCZ.6.1 26.5 26.5 26.5
# 9 2017-08-26 18:00:00 LCZ.6.9 26.5 26.5 26.5
# 10 2017-08-26 18:00:00 LCZ.9.4 26.0 26.5 26.0
If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.
dat4 <- dat2 %>%
mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
Max = (Value == max(Value, na.rm = TRUE)) + 0L,
Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
ungroup()
dat4
# # A tibble: 50 x 7
# Time Time2 LCZ Value Min Max Median
# <dttm> <dttm> <chr> <dbl> <int> <int> <int>
# 1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2 27.0 0 0 0
# 5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2 27.0 0 1 0
# 9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# # ... with 40 more rows
DATA
dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
'2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
'2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
'2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
'2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
'2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
'2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
'2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
'2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
'2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
'2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)
Here is a way of doing it with dplyr
verbs:
library(lubridate)
df %>%
gather(Location, Temp, -Time) %>%
group_by(Date = date(Time), HoD = hour(Time), Location) %>%
mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
filter(Temp == Min | Temp == Max) %>%
arrange(Location, Time) %>%
distinct(Temp, .keep_all = T) %>%
mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
dplyr::select(-Temp) %>%
spread("MinMax", "Time")
Output:
Notice the NA
which mean that minimum and maximum temperatures were identical on that day, in that hour and at that location.
# A tibble: 10 x 8
# Groups: Date, HoD, Location [10]
Location Date HoD Min Max Median MaxTime MinTime
<chr> <date> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 LCZ.3.10 2017-08-26 17 26.0 27.5 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
2 LCZ.3.10 2017-08-26 18 26.0 26.5 26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
3 LCZ.3.2 2017-08-26 17 26.5 27.5 27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
4 LCZ.3.2 2017-08-26 18 26.5 27.0 26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
5 LCZ.6.1 2017-08-26 17 26.5 27.5 27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
6 LCZ.6.1 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
7 LCZ.6.9 2017-08-26 17 26.0 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
8 LCZ.6.9 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
9 LCZ.9.4 2017-08-26 17 26.5 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4 2017-08-26 18 26.0 26.5 26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With