I have a dataframe which I am grouping using the group_by function, and summarizing it with using the summarize function in R.
MM_group<-group_by(SYC,Method,Maturity)
My dataset looks like this,
Year Group County Seed.Brand Seed.Variety Seed.Maturity
1 2014 Group 0 No-till Yankton Asgrow AG0832 0.8
2 2014 Group 0 No-till Brown Asgrow AG0934 0.9
3 2014 Group 0 No-till Brown Asgrow AG0934 0.9
4 2014 Group 0 No-till Brown Asgrow AG0934 0.9
5 2014 Group 0 No-till Brown Pioneer 90Y90 0.9
6 2014 Group 0 No-till Brown Asgrow AG0934 0.9
Yield Method Maturity digits
1 73.23 No-till 0 0
2 65.14 No-till 0 0
3 63.63 No-till 0 0
4 61.57 No-till 0 0
5 60.20 No-till 0 0
I am grouping by Method & Maturity. I am trying to get County and Year for maximum yield for the Method & Maturity combination.
I have done the following:
summarize(MM_group,Max_Yield=max(Yield))
Method Maturity Max_Yield
<chr> <chr> <dbl>
1 Irrigated 0 69.600
2 Irrigated 1 86.013
3 Irrigated 2 88.750
4 Irrigated 3 79.650
5 No-till 0 79.470
6 No-till 1 79.856
7 No-till 2 85.860
8 No-till 3 68.530
9 Non-irrigated 0 83.210
10 Non-irrigated 1 81.916
11 Non-irrigated 2 103.740
12 Non-irrigated 3 94.410
But, this doesn't give me the county name and year. I know I can use cbind or joins to get that data but wondering if there is another easier way of doing this.
Expected Output:
Method Maturity Max_Yield Year Group
<chr> <chr> <dbl> <int> <fctr>
1 Irrigated 0 69.600 2012 Group 0 or 1 Irrigated
2 Irrigated 1 86.013 2012 Group 0 or 1 Irrigated
3 Irrigated 2 88.750 2013 Group 2 or 3 Irrigated
4 Irrigated 3 79.650 2013 Group 2 or 3 Irrigated
5 No-till 0 79.470 2013 Group 0 No-till
6 No-till 1 79.856 2012 Group 1 No-till
7 No-till 2 85.860 2013 Group 2 No-till
8 No-till 3 68.530 2014 Group 3 No-till
9 Non-irrigated 0 83.210 2013 Group 0 Non-irrigated
10 Non-irrigated 1 81.916 2012 Group 1 Non-irrigated
11 Non-irrigated 2 103.740 2014 Group 2 Non-irrigated
12 Non-irrigated 3 94.410 2014 Group 3 Non-irrigated
R – Summary of Data Frame To get the summary of Data Frame, call summary() function and pass the Data Frame as argument to the function. We may pass additional arguments to summary() that affects the summary output. The output of summary() contains summary for each column.
Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.
The summarise() or summarize() functions performs the aggregations on grouped data, so in order to use these functions first, you need to use group_by() to get grouped dataframe. All these functions are from dplyr package. summarise() is used to get aggregation results on specified columns for each group.
Try
summarize(MM_group,
rank = which.max(Yield),
Year_rank = Year[rank],
County_rank = County[rank])
We can use
SYC %>%
group_by(Method, Maturity) %>%
slice(which.max(Yield)) %>%
rename(Max_Yield = Yield) %>%
select(Method, Maturity, Max_Yield, Year, Group)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With