Subset data table without using <-

Q: How do I select specific data in R?

To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.

Q: How do you subset data based on column value in R?

By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.

Tags:

r

data.table

subset

I want to subset some rows of a data table. Like this:

Click to copy

# load data
  data("mtcars")

# convert to data table
  setDT(mtcars,keep.rownames = T)

# Subset data
  mtcars <- mtcars[like(rn,"Mer"),] # or
  mtcars <- mtcars[mpg > 20,]

However, I'm working with a huge data set and I wanted to avoid using <-, which is not memory efficient because it makes a copy of the data.

Is this correct? Is it possible to update the filtered data without <- ?

831

asked Oct 01 '15 08:10

rafa.pereira

1 Answers

What you are asking would be delete rows by reference.

It is not yet possible, but there is FR for that #635.

Until then you need to copy (in-memory) your data.table subset, the copy is done by <- (or =) when is combined with subset (i arg) so for now you cannot avoid that.

If it will help somehow you can operate on language objects to predefine the operation and delay it's evaluation, also reuse predefined objects multiple times:

Click to copy

mtcars_sub <- quote(mtcars[like(rn,"Mer")])
mtcars_sub2 <- quote(eval(mtcars_sub)[mpg > 20])
eval(mtcars_sub2)
#           rn  mpg cyl  disp hp drat   wt qsec vs am gear carb
# 1: Merc 240D 24.4   4 146.7 62 3.69 3.19 20.0  1  0    4    2
# 2:  Merc 230 22.8   4 140.8 95 3.92 3.15 22.9  1  0    4    2

BTW. when subsetting data.table you don't need to use middle comma like dt[x==1,] you can use dt[x==1].

128

answered Sep 20 '22 18:09

jangorecki

Related questions
                            
                                How to quickly replicate/update local library under $R_LIBS_USER?
                            
                                Why does method inheritance kill additional arguments?
                            
                                How to use subfolders in 'src/' in R packages?
                            
                                Subsetting a large vector uses unnecessarily large amounts of memory
                            
                                R Googlsheets: Unable to use `gs_auth()` in googlesheets package - Sign In With Google Temporarily Disabled App Not Verified Issue
                            
                                Error installing tidyr on Ubuntu 18.04 & R 4.0.2
                            
                                Newman's modularity clustering for graphs
                            
                                What to do with imperfect-but-useful functions?
                            
                                Order of legend entries in ggplot2 barplots with coord_flip()
                            
                                How do you apply a function to a nested list?
                            
                                How to change the melt.data.frame function in reshape2 package returned "variable" column to "character" class?
                            
                                How do I conditionally change the aspect ratio of charts in R's Shiny package?
                            
                                Changing legend names without changing colors in ggplot2
                            
                                Why is subsetting on a "logical" type slower than subsetting on "numeric" type?
                            
                                all.equal on object with NULL names causes 'Error: not compatible with STRSXP' -- bug or expected?
                            
                                strsplit inconsistent with gregexpr
                            
                                Outputting Shiny (non-ggplot) plot to PDF
                            
                                How to use S3 methods from another package which uses export rather than S3method in its namespace without using Depends or library()
                            
                                In place modification of matrices in R [duplicate]
                            
                                R - readRDS() & load() fail to give identical data.tables as the original

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With