Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Summarise in dplyr not drop other columns in my data frame?

Tags:

r

I have a data frame with three columns in it and I am attempting a simple summary to find the maximum temperature for each city in the data frame, but also keep the date listed for each max temperature.

Here is the data frame:

we'll call it maxT

  new.ID       Date   Max_TemperatureF
1     TUS 1960-04-05               87
2     TUS 1984-04-24               86
3     TUS 1972-04-01               75
4     TUS 2006-04-14               91
5     TUS 2000-05-03               96
6     PHX 1960-04-05               93
7     PHX 1984-04-24               93
8     PHX 1972-04-01               84
9     PHX 2006-04-14               91
10    PHX 2000-05-03               99
11    LAS 1960-04-05               91
12    LAS 1984-04-24               86
13    LAS 1972-04-01               81
14    LAS 2006-04-14               81
15    LAS 2000-05-03               98
16    LAX 1960-04-05               72
17    LAX 1984-04-24               69
18    LAX 1972-04-01               73
19    LAX 2006-04-14               63
20    LAX 2000-05-03               69
21    SAC 1960-04-05               82
22    SAC 1984-04-24               75
23    SAC 1972-04-01               64
24    SAC 2006-04-14               71
25    SAC 2000-05-03               81
26    PSP 1960-04-05               98
27    PSP 1984-04-24               91
28    PSP 1972-04-01               91
29    PSP 2006-04-14               81
30    PSP 2000-05-03               9

Each city has 5 temperatures listed and I would like to find the maximum for each city and then also list the date. I am using dplyr and have tried a quite a few variations of this code, but Date is always dropped in the final product. Is there a way to add a condition like drop=FALSE or something similar?

maxT <- tbl_df(maxT) %>%
  select(new.ID,Date,Max_TemperatureF)%>%
  group_by(new.ID) %>% 
  summarise(max_temp= max(Max_TemperatureF))

This is the output I keep getting:

 new.ID max_temp
1    LAS       98
2    LAX       73
3    PHX       99
4    PSP       99
5    SAC       82
6    TUS       96

Thanks.

like image 287
user3720887 Avatar asked Apr 15 '15 18:04

user3720887


People also ask

What does dplyr Summarise do?

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.

What does summarize () do in R?

Summarize Function in R Programming. As its name implies, the summarize function reduces a data frame to a summary of just one vector or value. Many times, these summaries are calculated by grouping observations using a factor or categorical variables first.

What dplyr function can you use to pull out certain columns from a dataset in R?

Select certain columns in a data frame with the dplyr function select . Extract certain rows in a data frame according to logical (boolean) conditions with the dplyr function filter .


Video Answer


1 Answers

We could try either filter or slice. If there are ties for the maximum 'Max_TemperatureF' and want to get all those rows,

 tbl_df(test) %>%
      group_by(new.ID) %>% 
      filter(Max_TemperatureF==max(Max_TemperatureF))

Or we can get the index of the rows with which.max and subset with slice

 tbl_df(test) %>% 
       group_by(new.ID) %>% 
       slice(which.max(Max_TemperatureF))
like image 188
akrun Avatar answered Oct 11 '22 21:10

akrun