data.table operations by column name

Tags:

r

data.table

Suppose I have a data.table

a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")

I can add new columns like this:

a[, sa := sum(a), by="id"]
a[, sb := sum(b), by="id"]
> a
   id  a  b sa sb
1:  1 21 11 43 23
2:  1 22 12 43 23
3:  2 23 13 47 27
4:  2 24 14 47 27
5:  3 25 15 25 15

However, suppose that I have column names instead:

for (n in c("a","b")) {
  s <- paste0("s",n)
  a[, s := sum(n), by="id", with=FALSE] # ERROR: invalid 'type' (character) of argument
}

what do I do?

299

asked Jan 09 '14 15:01

sds

1 Answers

You can also do this:

a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")

a[, c("sa", "sb") := lapply(.SD, sum), by = id]

Or slightly more generally:

cols.to.sum = c("a", "b")
a[, paste0("s", cols.to.sum) := lapply(.SD, sum), by = id, .SDcols = cols.to.sum]

118

answered Sep 26 '22 03:09

eddi

Related questions
                            
                                Latex Formulas or symbols in table cells using knitr and kableExtra in R-Markdown,
                            
                                Extract time (HMS) from lubridate date time object?
                            
                                How can I auto-number math equations in RMarkdown?
                            
                                Use recode to mutate across multiple columns using named list of named vectors
                            
                                Highlight (shade) plot background in specific time range
                            
                                Calculating all distances between one point and a group of points efficiently in R
                            
                                How can I suppress the line numbers output using R CMD BATCH?
                            
                                fast sampling in R
                            
                                Logarithmic y-axis Tick Marks in R plot() or ggplot2()
                            
                                Re-arrange multiple columns in a data set into one column using R
                            
                                Why does evaluating an expression in system.time() make variables available in global environment?
                            
                                R: How do I use coord_cartesian on facet_grid with free-ranging axis
                            
                                How to create a matrix from vector returned by rep() function?
                            
                                python's scipy.stats.ranksums vs. R's wilcox.test
                            
                                Find the index of the column in data frame that contains the string as value
                            
                                "scale" or "ruler" type plot in r
                            
                                Using an expression in plot text - Printing the value of a variable rather than its name
                            
                                Update a data frame in shiny server.R without restarting the App
                            
                                Which algorithm used by the rnorm function
                            
                                Identify duplicates and mark first occurrence and all others

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With