Fast way to select rows within table in R?

Tags:

I am looking for a fast way to extract a large number of rows from an even larger table. The top of my table is as follows:

Click to copy

> head(dbsnp)

      snp      gene distance
rs5   rs5     KRIT1        1
rs6   rs6   CYP51A1        1
rs7   rs7 LOC401387        1
rs8   rs8      CDK6        1
rs9   rs9      CDK6        1
rs10 rs10      CDK6        1

And the dimensions:

Click to copy

> dim(dbsnp)
[1] 11934948        3

I want to select the rows that have the rownames contained in a list:

Click to copy

> head(features)
[1] "rs1367830" "rs5915027" "rs2060113" "rs1594503" "rs1116848" "rs1835693"

> length(features)
[1] 915635

Not surprisingly, the straightforward way of doing this temptable = dbsnp[features,] takes quite a long time.

I've been looking into ways to do this through the sqldf package in R. I thought that that might be faster. Unfortunately, I can't figure out how to select rows with certain rownames in SQL.

Thanks.

536

asked Aug 30 '12 19:08

Gordon Freeman

2 Answers

The data.table solution:

Click to copy

library(data.table)
dbsnp <- structure(list(snp = c("rs5", "rs6", "rs7", "rs8", "rs9", "rs10"
), gene = c("KRIT1", "CYP51A1", "LOC401387", "CDK6", "CDK6", 
"CDK6"), distance = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("snp", 
"gene", "distance"), class = "data.frame", row.names = c("rs5", 
"rs6", "rs7", "rs8", "rs9", "rs10"))

DT <- data.table(dbsnp, key='snp')
features <- c('rs5', 'rs7', 'rs9')
DT[features]

   snp      gene distance
1: rs5     KRIT1        1
2: rs7 LOC401387        1
3: rs9      CDK6        1

186

answered Sep 27 '22 19:09

Justin

Using sqldf you will need rownames = TRUE then you can query on the rownames using row_names:

Click to copy

library(sqldf)

## input

test<-read.table(header=T,text="      snp      gene distance
rs5   rs5     KRIT1        1
rs6   rs6   CYP51A1        1
rs7   rs7 LOC401387        1
rs8   rs8      CDK6        1
rs9   rs9      CDK6        1
rs10 rs10      CDK6        1
")
features<-c("rs5","rs7","rs10")

## calculate

inVar <- toString(shQuote(features, type = "csh")) # 'rs5','rs7','rs10'

fn$sqldf("SELECT * FROM test t
          WHERE t.row_names IN ($inVar)"
           , row.names = TRUE)

## result
#      snp      gene distance
#rs5   rs5     KRIT1        1
#rs7   rs7 LOC401387        1
#rs10 rs10      CDK6        1

UPDATE: Alternately if fet is a data frame whose features column contains the required items to find:

Click to copy

fet <- data.frame(features)
sqldf("SELECT t.* FROM test t
          WHERE t.row_names IN (SELECT features FROM fet)"
           , row.names = TRUE)

Also if the data were sufficiently large we could speed it up using indexes. See the sqldf home page for this and other details.

answered Sep 27 '22 17:09

shhhhimhuntingrabbits

Related questions
                            
                                Grouping Fiscal year using SQL Server
                            
                                How to deal with single quote in Word VBA SQL query?
                            
                                Oracle Create View issue
                            
                                Slow performing SQL query with triple self-join
                            
                                How do you display a Magento sql query as a string?
                            
                                Optimize mysql query to use index on a Bitwise where clause
                            
                                how to fetch, delete, commit from cursor
                            
                                how to find people with same family name?
                            
                                Counting how many MySQL fields in a row are filled (or empty)
                            
                                How do I see if there are multiple rows with an identical value in particular column?
                            
                                Skip first row in SQL Server 2005?
                            
                                Oracle 11g - Check constraint with RegEx
                            
                                Cannot lookup row in database by UUID RAW(32)
                            
                                How create a SQL array from a Java List?
                            
                                Update statement- Geography column - sql server
                            
                                Store a single sql server row in variable and then use the column values to construct a query
                            
                                How to make a right join using LINQ to SQL & C#?
                            
                                MySQLi prepared update statement in PHP
                            
                                Date range in WHERE clause from 90 days ago to today's date
                            
                                Shuffle a string with mysql/sql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast way to select rows within table in R?

Tags:

sql

r

data.table

row

sqldf

Gordon Freeman

People also ask

2 Answers

Justin

shhhhimhuntingrabbits

Recent Activity

Donate For Us