I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr. I have the following data: <pre class="prettyprint"><code> month type count 1 Feb-14 bbb 341 2 Feb-14 ccc 527 3 Feb-14 aaa 2674 4 Mar-14 bbb 811 5 Mar-14 ccc 1045 6 Mar-14 aaa 4417 7 Apr-14 bbb 1178 8 Apr-14 ccc 1192 9 Apr-14 aaa 4793 10 May-14 bbb 916 .. ... ... ... </code></pre> I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e. <pre class="prettyprint"><code> month type count per 1 Feb-14 bbb 341 9.6% 2 Feb-14 ccc 527 14.87% 3 Feb-14 aaa 2674 .. .. ... ... ... </code></pre> I've tried <pre class="prettyprint"><code>data %>% group_by(month, type) %>% summarise(count / sum(count)) </code></pre> This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?

Try <pre class="prettyprint"><code>library(dplyr) data %>% group_by(month) %>% mutate(countT= sum(count)) %>% group_by(type, add=TRUE) %>% mutate(per=paste0(round(100*count/countT,2),'%')) </code></pre> Or make it more simpler without creating additional columns <pre class="prettyprint"><code>data %>% group_by(month) %>% mutate(per = 100 *count/sum(count)) %>% ungroup </code></pre> We could also use <code>left_join</code> after summarising the <code>sum(count)</code> by 'month' Or an option using <code>data.table</code>. <pre class="prettyprint"><code> library(data.table) setkey(setDT(data), month)[data[, list(count=sum(count)), month], per:= paste0(round(100*count/i.count,2), '%')][] </code></pre>

And with a bit less code: <pre class="prettyprint"><code>df <- data.frame(month=c("Feb-14", "Feb-14", "Feb-14", "Mar-14", "Mar-14", "Mar-14", "Apr-14", "Apr-14", "Apr-14", "May-14"), type=c("bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb"), count=c(341, 527, 2674, 811, 1045, 4417, 1178, 1192, 4793, 916)) library(dplyr) df %>% group_by(month) %>% mutate(per=paste0(round(count/sum(count)*100, 2), "%")) %>% ungroup </code></pre> Since you want to "leave" your data frame untouched you shouldn't use <code>summarise</code>, <code>mutate</code> will suffice.

Finding percentage in a sub-group using group_by and summarise

Tags:

r

group-by

dplyr

I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr.

I have the following data:

   month   type  count
1  Feb-14  bbb   341
2  Feb-14  ccc   527
3  Feb-14  aaa  2674
4  Mar-14  bbb   811
5  Mar-14  ccc  1045
6  Mar-14  aaa  4417
7  Apr-14  bbb  1178
8  Apr-14  ccc  1192
9  Apr-14  aaa  4793
10 May-14  bbb   916
..    ...  ...   ...

I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e.

   month   type  count  per
1  Feb-14  bbb   341    9.6%
2  Feb-14  ccc   527    14.87%
3  Feb-14  aaa  2674    ..
..    ...  ...   ...

I've tried

data %>%
  group_by(month, type) %>%
  summarise(count / sum(count))

This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?

985

asked Apr 09 '15 21:04

KC.

2 Answers

Try

library(dplyr)
data %>%
    group_by(month) %>%
    mutate(countT= sum(count)) %>%
    group_by(type, add=TRUE) %>%
    mutate(per=paste0(round(100*count/countT,2),'%'))

Or make it more simpler without creating additional columns

data %>%
    group_by(month) %>%
    mutate(per =  100 *count/sum(count)) %>% 
    ungroup

We could also use left_join after summarising the sum(count) by 'month'

Or an option using data.table.

 library(data.table)
 setkey(setDT(data), month)[data[, list(count=sum(count)), month], 
               per:= paste0(round(100*count/i.count,2), '%')][]

answered Oct 17 '22 09:10

akrun

And with a bit less code:

df <- data.frame(month=c("Feb-14", "Feb-14", "Feb-14", "Mar-14", "Mar-14", "Mar-14", "Apr-14", "Apr-14", "Apr-14", "May-14"),
             type=c("bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb"),
             count=c(341, 527, 2674, 811, 1045, 4417, 1178, 1192, 4793, 916))


library(dplyr)

df %>% group_by(month) %>% 
       mutate(per=paste0(round(count/sum(count)*100, 2), "%")) %>% 
       ungroup

Since you want to "leave" your data frame untouched you shouldn't use summarise, mutate will suffice.

answered Oct 17 '22 07:10

dimitris_ps

Related questions
                            
                                What is the difference between mode and class in R?
                            
                                Selecting columns in R data frame based on those *not* in a vector
                            
                                Remove rows conditionally from a data.table in R
                            
                                dplyr: lead() and lag() wrong when used with group_by()
                            
                                Creating a Movie from a Series of Plots in R [closed]
                            
                                Make the background of a graph different colours in different regions
                            
                                plot.new has not been called yet
                            
                                Python interface for R Programming Language [duplicate]
                            
                                How do I change a single value in a data.frame?
                            
                                Producing subscripts in R markdown
                            
                                Unable to load rJava on R
                            
                                How to output text in the R console without creating new lines?
                            
                                Get the mean across multiple Pandas DataFrames
                            
                                Write a data frame to csv file without column header in R [duplicate]
                            
                                Return row number(s) for a particular value in a column in a dataframe
                            
                                R - test if first occurrence of string1 is followed by string2
                            
                                How do I save warnings and errors as output from a function?
                            
                                Extract R-square value with R in linear models [duplicate]
                            
                                Practical limits of R data frame
                            
                                remove all line breaks (enter symbols) from the string using R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With