Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping and summarizing by keeping other columns in R

Tags:

r

grouping

I have a dataframe which I am grouping using the group_by function, and summarizing it with using the summarize function in R.

MM_group<-group_by(SYC,Method,Maturity)

My dataset looks like this,

 Year           Group  County Seed.Brand Seed.Variety Seed.Maturity
1 2014 Group 0 No-till Yankton     Asgrow       AG0832           0.8
2 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
3 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
4 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
5 2014 Group 0 No-till   Brown    Pioneer        90Y90           0.9
6 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9

Yield  Method Maturity digits
1 73.23 No-till        0      0
2 65.14 No-till        0      0
3 63.63 No-till        0      0
4 61.57 No-till        0      0
5 60.20 No-till        0      0

I am grouping by Method & Maturity. I am trying to get County and Year for maximum yield for the Method & Maturity combination.

I have done the following:

summarize(MM_group,Max_Yield=max(Yield))

       Method Maturity Max_Yield
           <chr>    <chr>     <dbl>
1      Irrigated        0    69.600
2      Irrigated        1    86.013
3      Irrigated        2    88.750
4      Irrigated        3    79.650
5        No-till        0    79.470
6        No-till        1    79.856
7        No-till        2    85.860
8        No-till        3    68.530
9  Non-irrigated        0    83.210
10 Non-irrigated        1    81.916
11 Non-irrigated        2   103.740
12 Non-irrigated        3    94.410

But, this doesn't give me the county name and year. I know I can use cbind or joins to get that data but wondering if there is another easier way of doing this.

Expected Output:

          Method Maturity Max_Yield  Year                  Group
           <chr>    <chr>     <dbl> <int>                 <fctr>
1      Irrigated        0    69.600  2012 Group 0 or 1 Irrigated
2      Irrigated        1    86.013  2012 Group 0 or 1 Irrigated
3      Irrigated        2    88.750  2013 Group 2 or 3 Irrigated
4      Irrigated        3    79.650  2013 Group 2 or 3 Irrigated
5        No-till        0    79.470  2013        Group 0 No-till
6        No-till        1    79.856  2012        Group 1 No-till
7        No-till        2    85.860  2013        Group 2 No-till
8        No-till        3    68.530  2014        Group 3 No-till
9  Non-irrigated        0    83.210  2013  Group 0 Non-irrigated
10 Non-irrigated        1    81.916  2012  Group 1 Non-irrigated
11 Non-irrigated        2   103.740  2014  Group 2 Non-irrigated
12 Non-irrigated        3    94.410  2014  Group 3 Non-irrigated 
like image 941
Kasi Avatar asked Jul 09 '17 05:07

Kasi


People also ask

How do you Summarise data but keep columns R?

R – Summary of Data Frame To get the summary of Data Frame, call summary() function and pass the Data Frame as argument to the function. We may pass additional arguments to summary() that affects the summary output. The output of summary() contains summary for each column.

How do I group by multiple columns in R?

Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

Why the group by function is used with the Summarise function in R?

The summarise() or summarize() functions performs the aggregations on grouped data, so in order to use these functions first, you need to use group_by() to get grouped dataframe. All these functions are from dplyr package. summarise() is used to get aggregation results on specified columns for each group.


2 Answers

Try

summarize(MM_group, 
          rank = which.max(Yield),
          Year_rank = Year[rank],
          County_rank = County[rank])
like image 56
F. Privé Avatar answered Sep 25 '22 00:09

F. Privé


We can use

SYC %>%
   group_by(Method, Maturity) %>%
   slice(which.max(Yield)) %>% 
   rename(Max_Yield = Yield) %>%
   select(Method, Maturity, Max_Yield, Year, Group)
like image 30
akrun Avatar answered Sep 26 '22 00:09

akrun