I have the following data: <pre class="prettyprint"><code>Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed") Age <- c(22,12,31,35,58,82,17,34,12,24,44,67,43) Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") data <- data.frame(Name, Age, Group) </code></pre> And I'd like to use dplyr to (1) group the data by "Group" (2) show the min and max Age within each Group (3) show the Name of the person with the min and max ages The following code does this: <pre class="prettyprint"><code>data %>% group_by(Group) %>% summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))]) </code></pre> Which works well: <pre class="prettyprint"><code> Group minAge minAgeName maxAge maxAgeName 1 A 22 Sam 22 Sam 2 B 12 Sarah 58 James 3 C 17 Andrew 82 Sally 4 D 12 Mairin 67 Ray </code></pre> However, I have a problem if there are multiple min or max values: <pre class="prettyprint"><code>Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed") Age <- c(22,31,31,35,58,82,17,34,12,24,44,67,43) Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") data <- data.frame(Name, Age, Group) > data %>% group_by(Group) %>% + summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], + maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))]) Error: expecting a single value </code></pre> I'm looking for two solutions: (1) where it doesn't matter which min or max name is shown, just that one is shown (i.e., the first value found) (2) where if there are "ties" all minimum values and maximum values are shown Please let me know if this isn't clear and thanks in advance!

You can use <code>which.min</code> and <code>which.max</code> to get the first value. <pre class="prettyprint"><code>data %>% group_by(Group) %>% summarize(minAge = min(Age), minAgeName = Name[which.min(Age)], maxAge = max(Age), maxAgeName = Name[which.max(Age)]) </code></pre> To get all values, use e.g. paste with an appropriate <code>collapse</code> argument. <pre class="prettyprint"><code>data %>% group_by(Group) %>% summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "), maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", ")) </code></pre>

How to use Dplyr's Summarize and which() to lookup min/max values

Tags:

r

dplyr

I have the following data:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed") Age <- c(22,12,31,35,58,82,17,34,12,24,44,67,43) Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D")  data <- data.frame(Name, Age, Group)

And I'd like to use dplyr to

(1) group the data by "Group" (2) show the min and max Age within each Group (3) show the Name of the person with the min and max ages

The following code does this:

data %>% group_by(Group) %>%      summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))],                 maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])

Which works well:

  Group minAge minAgeName maxAge maxAgeName 1     A     22        Sam     22        Sam 2     B     12      Sarah     58      James 3     C     17     Andrew     82      Sally 4     D     12     Mairin     67        Ray

However, I have a problem if there are multiple min or max values:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed") Age <- c(22,31,31,35,58,82,17,34,12,24,44,67,43) Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D")  data <- data.frame(Name, Age, Group)  > data %>% group_by(Group) %>% +   summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))],  +             maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))]) Error: expecting a single value

I'm looking for two solutions:

(1) where it doesn't matter which min or max name is shown, just that one is shown (i.e., the first value found) (2) where if there are "ties" all minimum values and maximum values are shown

Please let me know if this isn't clear and thanks in advance!

814

asked May 12 '15 16:05

dreww2

2 Answers

You can use which.min and which.max to get the first value.

data %>% group_by(Group) %>%   summarize(minAge = min(Age), minAgeName = Name[which.min(Age)],              maxAge = max(Age), maxAgeName = Name[which.max(Age)])

To get all values, use e.g. paste with an appropriate collapse argument.

data %>% group_by(Group) %>%   summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "),              maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))

128

answered Oct 02 '22 05:10

shadow

I would actually recommend keeping your data in a "long" format. Here's how I would approach this:

library(dplyr)

Keeping all values when there are ties:

data %>%   group_by(Group) %>%   arrange(Age) %>%  ## optional   filter(Age %in% range(Age)) # Source: local data frame [8 x 3] # Groups: Group #  #     Name Age Group # 1    Sam  22     A # 2  Sarah  31     B # 3    Jim  31     B # 4  James  58     B # 5 Andrew  17     C # 6  Sally  82     C # 7 Mairin  12     D # 8    Ray  67     D

Keeping only one value when there are ties:

data %>%   group_by(Group) %>%   arrange(Age) %>%   slice(if (length(Age) == 1) 1 else c(1, n())) ## maybe overkill? # Source: local data frame [7 x 3] # Groups: Group #  #     Name Age Group # 1    Sam  22     A # 2  Sarah  31     B # 3  James  58     B # 4 Andrew  17     C # 5  Sally  82     C # 6 Mairin  12     D # 7    Ray  67     D

If you really want a "wide" dataset, the basic concept would be to gather and spread the data, using "tidyr":

library(dplyr) library(tidyr)  data %>%   group_by(Group) %>%   arrange(Age) %>%   slice(c(1, n())) %>%   mutate(minmax = c("min", "max")) %>%   gather(var, val, Name:Age) %>%   unite(key, minmax, var) %>%   spread(key, val) # Source: local data frame [4 x 5] #  #   Group max_Age max_Name min_Age min_Name # 1     A      22      Sam      22      Sam # 2     B      58    James      31    Sarah # 3     C      82    Sally      17   Andrew # 4     D      67      Ray      12   Mairin

Though what wide form you would want with ties is unclear.

answered Oct 02 '22 07:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Changing Column Names in a List of Data Frames in R
                            
                                simplest python equivalent to R's grepl
                            
                                How to increase size of the points in ggplot2, similar to cex in base plots?
                            
                                Multiple ROC curves in one plot ROCR
                            
                                Get the min of two columns
                            
                                Changing tick intervals when x axis values are dates
                            
                                Index value for matrix in R?
                            
                                how to create a list in R from two vectors (one would be the keys, the other the values)?
                            
                                How to loop through a list in R
                            
                                Difference between r-base and r-recommended packages
                            
                                How can I generate a GUID in R?
                            
                                Reduce size of legend area in barplot
                            
                                R and Leaflet: How to arrange label text across multiple lines
                            
                                Ensuring reproducibility in an R environment
                            
                                Suppress ticks in plot in r
                            
                                Speed up plot() function for large dataset
                            
                                How to get reverse of a TRUE/FALSE vector?
                            
                                R glmnet as.matrix() error message
                            
                                writing a matrix to a file, without a header and row numbers
                            
                                calculating time difference in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With