Aggregating values in a data frame based on key

Tags:

r

I've got a piece of aggregation code that works well enough but runs a bit slow against a data frame with 10e6 rows. I'm not that experienced in R so apologies for my cringe worthy code!

I just want to do a basic roll up and sum of values for a common key...

eg go from...

  key val
1   a   5
2   b   7
3   a   6

to...

  key val
1   a   11
2   b   7

the best i can manage is...

keys = unique(inp$key)
vals = sapply(keys, function(x) { sum(inp[inp$key==x,]$val) })
out = data.frame(key=keys, val=vals)

I have this gut feel that the inp[inp$key==x,] is not the best way. Is there an obvious speed up i'm missing? I can do it in Hadoop (since the 10e6 dataset is actually already a rollup from a 2e9 row dataset) but I'm trying to improve my R.

Cheers, Mat

938

asked Jul 25 '11 05:07

mat kelcey

1 Answers

Using sapply and split is another option. I'll extend via the data and benchmarks from @Chase's excellent answer.

fn.tapply <- function(daters) with(daters, tapply(val, key, FUN = sum))
fn.split <- function(daters) with(daters, sapply(split(val, key), sum))

str(dat)
# 'data.frame': 1000000 obs. of  2 variables:
#  $ key: Factor w/ 5 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ val: num  0.186 0.875 0.42 0.294 0.878 ...

benchmark(fn.tapply(dat), fn.split(dat)
          , columns = c("test", "elapsed", "relative")
          , order = "relative"
          , replications = 100
          )
#             test elapsed relative
# 2  fn.split(dat)   4.106  1.00000
# 1 fn.tapply(dat)  69.982 17.04384

answered Oct 15 '22 08:10

Joshua Ulrich

Related questions
                            
                                Breaking out of nested loops in R
                            
                                Disabling buttons in Shiny
                            
                                R: What do you call the :: and ::: operators and how do they differ?
                            
                                Apply a dataframe of boolean values onto another dataframe in R
                            
                                R script in Power BI returns date as Microsoft.OleDb.Date
                            
                                using purrr to affect single columns of each dataframe in a list
                            
                                how to change the color in geom_point or lines in ggplot [duplicate]
                            
                                How to use the Google satellite view as tile in leaflet with R
                            
                                Saving ggplot graph to PDF with fonts embedded in r
                            
                                Rscript detect if R script is being called/sourced from another script
                            
                                Merge and Perfectly Align Histogram and Boxplot using ggplot2
                            
                                How to download entire repository from Github using R?
                            
                                dplyr rowwise sum and other functions like max
                            
                                Summing values in R based on column value with dplyr
                            
                                Reading shape file with sf::st_read fails to capture encoding UTF8
                            
                                How can we check if any 2 intervals of a unique ID overlaps?
                            
                                Choose variables based on name (simple regular expression)
                            
                                R code examples/best practices
                            
                                Merge a lot of data frames in R
                            
                                Loading someone else's .rdata file, can't access the data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Aggregating values in a data frame based on key

Tags:

idioms

r

mat kelcey

People also ask

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us