Suppose I have data in an R table which looks like this: <pre class="prettyprint"><code>Id Name Price sales Profit Month Category Mode 1 A 2 5 8 1 X K 1 A 2 6 9 2 X K 1 A 2 5 8 3 X K 1 B 2 4 6 1 Y L 1 B 2 3 4 2 Y L 1 B 2 5 7 3 Y L 2 C 2 5 11 1 X M 2 C 2 5 11 2 X L 2 C 2 5 11 3 X K 2 D 2 8 10 1 Y M 2 D 2 8 10 2 Y K 2 D 2 5 7 3 Y K 3 E 2 5 9 1 Y M 3 E 2 5 9 2 Y L 3 E 2 5 9 3 Y M 3 F 2 4 7 1 Z M 3 F 2 5 8 2 Z L 3 F 2 5 8 3 Z M </code></pre> If I use the <code>table</code> function on this data like: <pre class="prettyprint"><code>table(df$Category, df$Mode) </code></pre> It will show me under each mode which category has how many observations. It's like counting the number of items in each category under each mode. But what if I want the table to show under each <code>Category</code> which <code>Mode</code> earned how much <code>Profit</code> (sum or mean) and not the total count? Is there any way to do this with the <code>table</code> function or another function in R?

We can use <code>xtabs</code> from <code>base R</code>. By default, the <code>xtabs</code> gets the <code>sum</code> <pre class="prettyprint"><code>xtabs(Profit~Category+Mode, df) # Mode #Category K L M # X 36 11 11 # Y 17 26 28 # Z 0 8 15 </code></pre> Or another <code>base R</code> option that is more flexible to apply different <code>FUN</code> is <code>tapply</code>. <pre class="prettyprint"><code>with(df, tapply(Profit, list(Category, Mode), FUN=sum)) # K L M #X 36 11 11 #Y 17 26 28 #Z NA 8 15 </code></pre> <hr> Or we can use <code>dcast</code> to convert from 'long' to 'wide' format. It is more flexible as we can specify the <code>fun.aggregate</code> to <code>sum</code>, <code>mean</code>, <code>median</code> etc. <pre class="prettyprint"><code>library(reshape2) dcast(df, Category~Mode, value.var='Profit', sum) # Category K L M #1 X 36 11 11 #2 Y 17 26 28 #3 Z 0 8 15 </code></pre> <hr> If you need it in the 'long' format, here is one option with <code>data.table</code>. We convert the 'data.frame' to 'data.table' (<code>setDT(df)</code>), grouped by 'Category' and 'Mode', we get the <code>sum</code> of 'Profit'. <pre class="prettyprint"><code>library(data.table) setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)] </code></pre>

Another possibility consists in using the <code>aggregate()</code> function: <pre class="prettyprint"><code>profit_dat <- aggregate(Profit ~ Category + Mode, data=df, sum) #> profit_dat # Category Mode Profit #1 X K 36 #2 Y K 17 #3 X L 11 #4 Y L 26 #5 Z L 8 #6 X M 11 #7 Y M 28 #8 Z M 15 </code></pre>

R table function: how to sum instead of counting? [duplicate]

Tags:

r

aggregate

Suppose I have data in an R table which looks like this:

Id  Name Price sales Profit Month Category Mode
1   A     2     5     8       1     X       K
1   A     2     6     9       2     X       K
1   A     2     5     8       3     X       K
1   B     2     4     6       1     Y       L
1   B     2     3     4       2     Y       L
1   B     2     5     7       3     Y       L
2   C     2     5    11       1     X       M
2   C     2     5    11       2     X       L
2   C     2     5    11       3     X       K
2   D     2     8    10       1     Y       M
2   D     2     8    10       2     Y       K
2   D     2     5    7        3     Y       K
3   E     2     5    9        1     Y       M
3   E     2     5    9        2     Y       L
3   E     2     5    9        3     Y       M
3   F     2     4    7        1     Z       M
3   F     2     5    8        2     Z       L
3   F     2     5    8        3     Z       M

If I use the table function on this data like:

table(df$Category, df$Mode)

It will show me under each mode which category has how many observations. It's like counting the number of items in each category under each mode.

But what if I want the table to show under each Category which Mode earned how much Profit (sum or mean) and not the total count?

Is there any way to do this with the table function or another function in R?

449

asked Sep 01 '15 07:09

Jay khan

2 Answers

We can use xtabs from base R. By default, the xtabs gets the sum

xtabs(Profit~Category+Mode, df)
#           Mode
#Category  K  L  M
#       X 36 11 11
#       Y 17 26 28
#       Z  0  8 15

Or another base R option that is more flexible to apply different FUN is tapply.

with(df, tapply(Profit, list(Category, Mode), FUN=sum))
#  K  L  M
#X 36 11 11
#Y 17 26 28
#Z NA  8 15

Or we can use dcast to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate to sum, mean, median etc.

library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category  K  L  M
#1        X 36 11 11
#2        Y 17 26 28
#3        Z  0  8 15

If you need it in the 'long' format, here is one option with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Category' and 'Mode', we get the sum of 'Profit'.

library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]

141

answered Sep 22 '22 06:09

akrun

Another possibility consists in using the aggregate() function:

profit_dat <- aggregate(Profit ~ Category + Mode, data=df, sum)
#> profit_dat
#  Category Mode Profit
#1        X    K     36
#2        Y    K     17
#3        X    L     11
#4        Y    L     26
#5        Z    L      8
#6        X    M     11
#7        Y    M     28
#8        Z    M     15

answered Sep 22 '22 06:09

RHertel

Related questions
                            
                                Is there a way to limit vline lengths in ggplot2
                            
                                Legend for summary statistics in ggplot2
                            
                                Why is it slower to prespecify type in a data.frame?
                            
                                Ghost factor levels in R [duplicate]
                            
                                Splitting a string by space except when contained within quotes
                            
                                Change grid line behavior in ggplot2
                            
                                Remove whiskers in box-whisker-plot
                            
                                knitr templates and child documents in a loop
                            
                                R: ifelse function returns vector position instead of value (string)
                            
                                Output a good-looking matrix using renderTable()
                            
                                Deleting reversed duplicates with R
                            
                                Plot mean and standard deviation by category
                            
                                Get characters before first space
                            
                                Error in plot, formula missing when using svm
                            
                                RStudio Shiny list from checking rows in dataTables
                            
                                Get count of group-level observations with multiple individual observations from dataframe in R
                            
                                fill in NA based on the last non-NA value for each group in R [duplicate]
                            
                                Use of scale_x_discrete in R ggplot2
                            
                                Plots not working in for loop [duplicate]
                            
                                convert local dateTime to UTC in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With