Calculate between columns in data.table or dplyr?

Tags:

I want to use data.table to achieve a very simple task for a large dataset.

Calculate mean of val1 and val2 for each ID.

For details, please refer to the attached fake data.

library(data.table)
DT <- data.table(ID = paste0("ID",rep(1:5,each=2)),
      level= rep(c("CTRL","CTRL","ID1","ID2","ID3"),2),
      val1 = 1:10, 
      val2 = rnorm(10))

Here I want to calculate for each ID, the mean of val1 and val2.

Also notice that in each ID, there're different levels. But for each unique ID, I just want one mean incorporating the different levels, val1, and val2.

--- ID | Mean ---

-- ID1 | ...

-- ID2 | ...

-- ID3 | ...

I tried the following code, but it doesn't work.

topagents <- DT[, mean = mean(list(val1,val2)), 
                    by = ID]

but it doesn't work. I know how to do it in reshape2, first melt and then dcast.

But the original dataset is relatively large with 20M rows and 12 fields, it takes quite a long time to do the calculation.

So I prefer to use data.table or dplyr.

375

asked Jan 15 '14 04:01

Bigchao

2 Answers

Encapsulate the calls to mean in the list, rather than taking the mean of a list, which you can't do:

DT[, j=list(val1=mean(val1), val2=mean(val2)), by=ID]
    ID val1       val2
1: ID1  1.5  0.1389794
2: ID2  3.5  0.3392179
3: ID3  5.5 -0.6336174
4: ID4  7.5  0.9941148
5: ID5  9.5  0.1324782

To get a single value, the mean of the val1 and val2 values, combine these and pass to mean:

DT[, j=list(mean=mean(c(val1,val2))), by=ID]
    ID      mean
1: ID1 0.8194897
2: ID2 1.9196090
3: ID3 2.4331913
4: ID4 4.2470574
5: ID5 4.8162391

Using a list for the single element of j here is an easy way to name the resulting column.

answered Oct 11 '22 12:10

Matthew Lundberg

topagents <- DT[, mean(c(val1,val2)), by = ID]

mean can only take a vector, it doesn't understand a list.

Your question said "Calculate mean of val1 and val2 for each ID." But based on Mathew's answer maybe you wanted "Calculate means(plural) of val1 and val2 for each ID."?

answered Oct 11 '22 14:10

JeremyS

Related questions
                            
                                R combinations with dot ("."), "~", and pipe (%>%) operator
                            
                                subset row in data.table by id
                            
                                How to visualize a list of lists of lists of ... in R?
                            
                                Does R have 'dict' as in python or 'map' as in c++ do?
                            
                                coding variable values into classes using R
                            
                                R: Plotting one ECDF on top of another in different colors
                            
                                Why is using `<<-` frowned upon and how can I avoid it?
                            
                                "Unrecognized escape in character string" while attempting to read a CSV file
                            
                                Is it possible to plot a boxplot from previously-calculated statistics easily (in R?) [duplicate]
                            
                                geom_area plot with areas and outlines ggplot
                            
                                Plotting a regression line through the origin
                            
                                R: Functions -- Display of environment name instead of memory address of that environment?
                            
                                nlmer longitudinal data
                            
                                How to create a bipartite network in R with igraph or tnet
                            
                                How can I vectorize access to neighbour vector elements in R?
                            
                                Remove 'y' label from plot in R
                            
                                nested foreach loops in R to update common array
                            
                                Create a variable length `alist()`
                            
                                Sort descending all columns of data.frame
                            
                                How do I indicate collate order in Roxygen2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculate between columns in data.table or dplyr?

Tags:

r

data.table

dplyr

Bigchao

People also ask

2 Answers

Matthew Lundberg

JeremyS

Recent Activity

Donate For Us