I want to subset some rows of a data table. Like this:
# load data
data("mtcars")
# convert to data table
setDT(mtcars,keep.rownames = T)
# Subset data
mtcars <- mtcars[like(rn,"Mer"),] # or
mtcars <- mtcars[mpg > 20,]
However, I'm working with a huge data set and I wanted to avoid using <-
, which is not memory efficient because it makes a copy of the data.
Is this correct?
Is it possible to update the filtered data without <-
?
To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.
By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.
What you are asking would be delete rows by reference.
It is not yet possible, but there is FR for that #635.
Until then you need to copy (in-memory) your data.table subset, the copy is done by <-
(or =
) when is combined with subset (i
arg) so for now you cannot avoid that.
If it will help somehow you can operate on language objects to predefine the operation and delay it's evaluation, also reuse predefined objects multiple times:
mtcars_sub <- quote(mtcars[like(rn,"Mer")])
mtcars_sub2 <- quote(eval(mtcars_sub)[mpg > 20])
eval(mtcars_sub2)
# rn mpg cyl disp hp drat wt qsec vs am gear carb
# 1: Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# 2: Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
BTW. when subsetting data.table you don't need to use middle comma like dt[x==1,]
you can use dt[x==1]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With