data.frame Group By column [duplicate]

Tags:

aggregate

I have a data frame DF.

Say DF is:

Now I want to combine together the rows by the column A and to have the sum of the column B.

For example:

Click to copy

I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".

In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.

749

asked Sep 14 '13 08:09

nikosdi

2 Answers

This is a common question. In base, the option you're looking for is aggregate. Assuming your data.frame is called "mydf", you can use the following.

Click to copy

> aggregate(B ~ A, mydf, sum)
  A  B
1 1  5
2 2  3
3 3 11

I would also recommend looking into the "data.table" package.

Click to copy

> library(data.table)
> DT <- data.table(mydf)
> DT[, sum(B), by = A]
   A V1
1: 1  5
2: 2  3
3: 3 11

100

answered Oct 21 '22 22:10

A5C1D2H2I1M1N2O1R2T1

Using dplyr:

Click to copy

require(dplyr)    
df <- data.frame(A = c(1, 1, 2, 3, 3), B = c(2, 3, 3, 5, 6))
df %>% group_by(A) %>% summarise(B = sum(B))

## Source: local data frame [3 x 2]
## 
##   A  B
## 1 1  5
## 2 2  3
## 3 3 11

With sqldf:

Click to copy

library(sqldf)
sqldf('SELECT A, SUM(B) AS B FROM df GROUP BY A')

answered Oct 22 '22 00:10

mpalanco

Related questions
                            
                                How do I change the number of decimal places on axis labels in ggplot2?
                            
                                Delete rows containing specific strings in R
                            
                                Why does NaN^0 == 1
                            
                                R: sourcing files using a relative path
                            
                                Modifying fonts in ggplot2
                            
                                Prevent unlist to drop NULL values
                            
                                When using "geom_histogram" there is error "unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0". Why
                            
                                Name columns within aggregate in R
                            
                                How do I name the "row names" column in r
                            
                                how do you send email from R
                            
                                How to specify the actual x axis values to plot as x axis ticks in R
                            
                                Screening (multi)collinearity in a regression model
                            
                                How to load packages in R automatically?
                            
                                How to read first 1000 lines of .csv file into R? [closed]
                            
                                Calculate the Area under a Curve
                            
                                How to increase the space between the bars in a bar plot in ggplot2?
                            
                                store output of system command into a variable in r
                            
                                How to reset par(mfrow) in R
                            
                                Is there a table or catalog of aesthetics for ggplot2?
                            
                                How to perform multiple left joins using dplyr in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

data.frame Group By column [duplicate]

Tags:

r

aggregate

nikosdi

People also ask

2 Answers

A5C1D2H2I1M1N2O1R2T1

mpalanco

Recent Activity

Donate For Us