Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: Using with=False and transforming function/summary function?

I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I wanted to pass these in, using with=FALSE.

carsx=as.data.table(cars)
lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found
    function(x)  carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] ) 

Since this does not work, I tried the more simple approach without lapply.

carsx[,list(mean("speed"), min("speed"), max("speed") ), with=FALSE ] #error object 'ansvals' not found

This does not work either. Is there any way to do something like this? Is this behaviour of 'with' wanted? (I am aware that ?data.table mentions with only to select columns, but in my case it would be useful to be able to transform them as well)

When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically.

EDIT My aim is to get a summary per group in wide format, for different variables. I tried to extend the following, which works only for one variable, for a list of variables.

carsx[,list(mean(speed), min(speed), max(speed) ) ,by=(dist>50)

Lamentably SO doesnt let me post my other question. There I described that I want an output similiar to:

lapply( list(speed="speed",dist= "dist"),
        function(x) do.call("as.data.frame", aggregate(cars[,x], list(class=cars$dist>50), FUN=summary) ) )

Expected Output would be something like:

$speed 
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25

$dist
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25
like image 748
Julian Avatar asked Nov 10 '14 12:11

Julian


People also ask

What is data table in R?

data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.

Which R function can be used to make changes to a data frame?

transform() function in R Language is used to modify data. It converts the first argument to the data frame. This function is used to transform/modify the data frame in a quick and easy way.

What is Dcast function in R?

dcast: Convert data between wide and long forms.


1 Answers

You can specify the columns with the .SDcols parameter:

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = c("speed", "dist")]
#    speed   dist
# 1:  15.4  42.98
# 2:   4.0   2.00
# 3:  25.0 120.00

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = "speed"]
#    speed
# 1:  15.4
# 2:   4.0
# 3:  25.0
like image 175
Sven Hohenstein Avatar answered Sep 19 '22 23:09

Sven Hohenstein