I have a data frame with three columns in it and I am attempting a simple summary to find the maximum temperature for each city in the data frame, but also keep the date listed for each max temperature.
Here is the data frame:
we'll call it maxT
new.ID Date Max_TemperatureF
1 TUS 1960-04-05 87
2 TUS 1984-04-24 86
3 TUS 1972-04-01 75
4 TUS 2006-04-14 91
5 TUS 2000-05-03 96
6 PHX 1960-04-05 93
7 PHX 1984-04-24 93
8 PHX 1972-04-01 84
9 PHX 2006-04-14 91
10 PHX 2000-05-03 99
11 LAS 1960-04-05 91
12 LAS 1984-04-24 86
13 LAS 1972-04-01 81
14 LAS 2006-04-14 81
15 LAS 2000-05-03 98
16 LAX 1960-04-05 72
17 LAX 1984-04-24 69
18 LAX 1972-04-01 73
19 LAX 2006-04-14 63
20 LAX 2000-05-03 69
21 SAC 1960-04-05 82
22 SAC 1984-04-24 75
23 SAC 1972-04-01 64
24 SAC 2006-04-14 71
25 SAC 2000-05-03 81
26 PSP 1960-04-05 98
27 PSP 1984-04-24 91
28 PSP 1972-04-01 91
29 PSP 2006-04-14 81
30 PSP 2000-05-03 9
Each city has 5 temperatures listed and I would like to find the maximum for each city and then also list the date. I am using dplyr and have tried a quite a few variations of this code, but Date is always dropped in the final product. Is there a way to add a condition like drop=FALSE or something similar?
maxT <- tbl_df(maxT) %>%
select(new.ID,Date,Max_TemperatureF)%>%
group_by(new.ID) %>%
summarise(max_temp= max(Max_TemperatureF))
This is the output I keep getting:
new.ID max_temp
1 LAS 98
2 LAX 73
3 PHX 99
4 PSP 99
5 SAC 82
6 TUS 96
Thanks.
summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.
Summarize Function in R Programming. As its name implies, the summarize function reduces a data frame to a summary of just one vector or value. Many times, these summaries are calculated by grouping observations using a factor or categorical variables first.
Select certain columns in a data frame with the dplyr function select . Extract certain rows in a data frame according to logical (boolean) conditions with the dplyr function filter .
We could try either filter
or slice
. If there are ties for the maximum 'Max_TemperatureF' and want to get all those rows,
tbl_df(test) %>%
group_by(new.ID) %>%
filter(Max_TemperatureF==max(Max_TemperatureF))
Or we can get the index of the rows with which.max
and subset with slice
tbl_df(test) %>%
group_by(new.ID) %>%
slice(which.max(Max_TemperatureF))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With