I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.
This is part of my dataframe: (it also contains missing values NA)
Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-08-26 17:00:00 27.5 27.5 27.5 27.0 27.0
2 2017-08-26 17:10:00 27.5 27.0 27.5 27.0 27.0
3 2017-08-26 17:20:00 27.5 27.0 27.0 27.0 27.0
4 2017-08-26 17:30:00 27.0 26.5 27.0 26.5 26.5
5 2017-08-26 17:40:00 26.5 26.5 26.5 26.5 26.5
6 2017-08-26 17:50:00 26.5 26.0 26.5 26.0 26.5
7 2017-08-26 18:00:00 26.5 26.0 26.5 26.5 26.5
8 2017-08-26 18:10:00 27.0 26.0 26.5 26.5 26.0
9 2017-08-26 18:20:00 26.5 26.5 26.5 26.5 26.0
10 2017-08-26 18:30:00 26.5 26.5 26.5 26.5 26.0
I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.
Is this possible with R?
I tried already various functions.
group_by allowed me to calculate min/max for each column but without timestamps. period.apply also allowed me to calculate min/max/median but only for one column.
Also aggregate() didn't lead to any succes.
I'm learning in R, but didn't came close to a solution on this problem.
This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance
We can use floor_date from the lubridate package to create a new column Time2 to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date or ceiling_date. After that, we can use gather from the tidyr package to convert the data frame from wide format to long format.
library(dplyr)
library(tidyr)
library(lubridate)
dat2 <- dat %>%
mutate(Time = ymd_hms(Time),
Time2 = floor_date(Time, unit = "hour")) %>%
gather(LCZ, Value, starts_with("LCZ")) %>%
group_by(Time2, LCZ)
After that, we can summarize the data by LCZ and Time2.
dat3 <- dat2 %>%
summarise(Min = min(Value, na.rm = TRUE),
Max = max(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE)) %>%
ungroup()
dat3
# # A tibble: 10 x 5
# Time2 LCZ Min Max Median
# <dttm> <chr> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 LCZ.3.10 26.0 27.5 26.8
# 2 2017-08-26 17:00:00 LCZ.3.2 26.5 27.5 27.2
# 3 2017-08-26 17:00:00 LCZ.6.1 26.5 27.5 27.0
# 4 2017-08-26 17:00:00 LCZ.6.9 26.0 27.0 26.8
# 5 2017-08-26 17:00:00 LCZ.9.4 26.5 27.0 26.8
# 6 2017-08-26 18:00:00 LCZ.3.10 26.0 26.5 26.2
# 7 2017-08-26 18:00:00 LCZ.3.2 26.5 27.0 26.5
# 8 2017-08-26 18:00:00 LCZ.6.1 26.5 26.5 26.5
# 9 2017-08-26 18:00:00 LCZ.6.9 26.5 26.5 26.5
# 10 2017-08-26 18:00:00 LCZ.9.4 26.0 26.5 26.0
If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.
dat4 <- dat2 %>%
mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
Max = (Value == max(Value, na.rm = TRUE)) + 0L,
Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
ungroup()
dat4
# # A tibble: 50 x 7
# Time Time2 LCZ Value Min Max Median
# <dttm> <dttm> <chr> <dbl> <int> <int> <int>
# 1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2 27.0 0 0 0
# 5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2 27.0 0 1 0
# 9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# # ... with 40 more rows
DATA
dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
'2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
'2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
'2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
'2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
'2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
'2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
'2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
'2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
'2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
'2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)
Here is a way of doing it with dplyr verbs:
library(lubridate)
df %>%
gather(Location, Temp, -Time) %>%
group_by(Date = date(Time), HoD = hour(Time), Location) %>%
mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
filter(Temp == Min | Temp == Max) %>%
arrange(Location, Time) %>%
distinct(Temp, .keep_all = T) %>%
mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
dplyr::select(-Temp) %>%
spread("MinMax", "Time")
Output:
Notice the NA which mean that minimum and maximum temperatures were identical on that day, in that hour and at that location.
# A tibble: 10 x 8
# Groups: Date, HoD, Location [10]
Location Date HoD Min Max Median MaxTime MinTime
<chr> <date> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 LCZ.3.10 2017-08-26 17 26.0 27.5 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
2 LCZ.3.10 2017-08-26 18 26.0 26.5 26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
3 LCZ.3.2 2017-08-26 17 26.5 27.5 27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
4 LCZ.3.2 2017-08-26 18 26.5 27.0 26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
5 LCZ.6.1 2017-08-26 17 26.5 27.5 27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
6 LCZ.6.1 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
7 LCZ.6.9 2017-08-26 17 26.0 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
8 LCZ.6.9 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
9 LCZ.9.4 2017-08-26 17 26.5 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4 2017-08-26 18 26.0 26.5 26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With