I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but that does not seem "right". I want to retain <code>State.Full.Name</code> after grouping by <code>State</code>. Thanks <pre class="prettyprint"><code>TDAAtest <- data.frame(State=sample(state.abb,1000,replace=TRUE)) TDAAtest$State.Full.Name <- state.name[match(TDAAtest$State,state.abb)] TDAA.states <- TDAAtest %>% filter(!is.na(State)) %>% group_by(State) %>% summarize(n=n()) %>% ungroup() %>% arrange(State) </code></pre>

Perhaps we need <pre class="prettyprint"><code>TDAAtest %>% filter(!is.na(State)) %>% group_by(State) %>% summarise(State.Full.Name = first(State.Full.Name), n = n()) </code></pre> <hr> Or use <code>mutate</code> to create the column and then do the <code>distinct</code> <pre class="prettyprint"><code>TDAAtest %>% f filter(!is.na(State)) %>% group_by(State) %>% mutate(n= n()) %>% distinct(State, .keep_all=TRUE) </code></pre>

I believe there are more accurate answers than the accepted answer specially when you don't have unique data for other columns in each group (e.g. max or min or top n items based on one particular column ). Although the accepted answer works for this question, for instance, you would like to find the county with the max population for each state. (You need to have <code>county</code> and <code>population</code> columns). We have the following options: 1. dplyr version From this link, you have three extra operations (<code>mutate</code>, <code>ungroup</code> and <code>filter</code>) to achieve that: <pre class="prettyprint"><code>TDAAtest %>% filter(!is.na(State)) %>% group_by(State) %>% mutate(maxPopulation = max(Population)) %>% ungroup() %>% filter(maxPopulation == Population) </code></pre> 2. Function version This one gives you as much flexibility as you want and you can apply any kind of operation to each group: <pre class="prettyprint"><code>maxFUN = function(x) { # order population in a descending order x = x[with(x, order(-Population)), ] x[1, ] } TDAAtest %>% filter(!is.na(State)) %>% group_by(State) %>% do(maxFUN(.)) </code></pre> This one is highly recommended for more complex operations. For instance, you can return top n (<code>topN</code>) counties per state by having <code>x[1:topN]</code> for the returned dataframe in <code>maxFUN</code>.

R - dplyr Summarize and Retain Other Columns

Tags:

r

dplyr

summarize

I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but that does not seem "right". I want to retain State.Full.Name after grouping by State. Thanks

TDAAtest <- data.frame(State=sample(state.abb,1000,replace=TRUE)) TDAAtest$State.Full.Name <- state.name[match(TDAAtest$State,state.abb)]   TDAA.states <- TDAAtest %>%   filter(!is.na(State)) %>%   group_by(State) %>%   summarize(n=n()) %>%   ungroup() %>%   arrange(State)

658

asked Aug 23 '16 03:08

atclaus

2 Answers

Perhaps we need

TDAAtest %>%       filter(!is.na(State)) %>%      group_by(State) %>%       summarise(State.Full.Name = first(State.Full.Name), n = n())

Or use mutate to create the column and then do the distinct

TDAAtest %>% f      filter(!is.na(State)) %>%      group_by(State) %>%       mutate(n= n()) %>%       distinct(State, .keep_all=TRUE)

126

answered Sep 21 '22 13:09

akrun

I believe there are more accurate answers than the accepted answer specially when you don't have unique data for other columns in each group (e.g. max or min or top n items based on one particular column ).

Although the accepted answer works for this question, for instance, you would like to find the county with the max population for each state. (You need to have county and population columns).

We have the following options:

1. dplyr version

From this link, you have three extra operations (mutate, ungroup and filter) to achieve that:

TDAAtest %>%       filter(!is.na(State)) %>%      group_by(State) %>%       mutate(maxPopulation = max(Population)) %>%       ungroup() %>%      filter(maxPopulation == Population)

2. Function version

This one gives you as much flexibility as you want and you can apply any kind of operation to each group:

maxFUN = function(x) {   # order population in a descending order   x = x[with(x, order(-Population)), ]   x[1, ] }  TDAAtest %>%       filter(!is.na(State)) %>%      group_by(State) %>%      do(maxFUN(.))

This one is highly recommended for more complex operations. For instance, you can return top n (topN) counties per state by having x[1:topN] for the returned dataframe in maxFUN.

answered Sep 21 '22 13:09

Habib Karbasian

Related questions
                            
                                How to make R use all processors?
                            
                                What are the default plotting colors in R or ggplot2? [duplicate]
                            
                                Memory profiling in R - tools for summarizing
                            
                                Generating a Call Graph in R
                            
                                Is it possible to use R package data in testthat tests or run_examples()?
                            
                                Facet with free scales but keep aspect ratio fixed
                            
                                Is there an R dplyr method for merge with all=TRUE?
                            
                                Why is `row.names` preferred over `rownames`?
                            
                                R not responding request to interrupt stop process
                            
                                In R, how do you loop over the rows of a data frame really fast?
                            
                                R: Insert a vector as a row in data.frame
                            
                                How should I handle 'helper' functions in an R package?
                            
                                S3 method consistency warning when building R package with Roxygen
                            
                                'Embedded nul in string' error when importing csv with fread
                            
                                doParallel error in R: Error in serialize(data, node$con) : error writing to connection
                            
                                Is it possible to modify a data.frame in-place (destructively)?
                            
                                Migrating R libraries
                            
                                How to get vector of options from server.R to ui.R for selectInput in Shiny R App
                            
                                traceback() for interactive and non-interactive R sessions
                            
                                Non character argument in R string split function (strsplit)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With