Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to select rows within table in R?

I am looking for a fast way to extract a large number of rows from an even larger table. The top of my table is as follows:

> head(dbsnp)

      snp      gene distance
rs5   rs5     KRIT1        1
rs6   rs6   CYP51A1        1
rs7   rs7 LOC401387        1
rs8   rs8      CDK6        1
rs9   rs9      CDK6        1
rs10 rs10      CDK6        1

And the dimensions:

> dim(dbsnp)
[1] 11934948        3

I want to select the rows that have the rownames contained in a list:

> head(features)
[1] "rs1367830" "rs5915027" "rs2060113" "rs1594503" "rs1116848" "rs1835693"

> length(features)
[1] 915635

Not surprisingly, the straightforward way of doing this temptable = dbsnp[features,] takes quite a long time.

I've been looking into ways to do this through the sqldf package in R. I thought that that might be faster. Unfortunately, I can't figure out how to select rows with certain rownames in SQL.

Thanks.

like image 536
Gordon Freeman Avatar asked Aug 30 '12 19:08

Gordon Freeman


People also ask

How to select columns and rows from a data frame in R?

In this article, we will learn how to select columns and rows from a data frame in R. We start by selecting a specific column. Similar to lists, we can use the double bracket [ []] operator to select a column. This will return a vector data type. If we want to select a column and return a data frame, we can use the single bracket notation.

How to select multiple rows by conditions in R?

In order to get the multiple rows by name use the vector with the values, you wanted to return. Let’s see some examples of how to select rows by conditions in R, for example, conditions include equal, not equal. And also some examples to get rows based on multiple conditions. To get rows based on column value use %in% operator. 3.

How do I select rows from a list of values?

The following code shows how to select rows where the value in a certain column belongs to a list of values: #select rows where team is equal to 'A' or 'C' df [df$team %in% c ('A', 'C'), ]

How do you select multiple columns in a matrix in R?

If we want to select a column and return a data frame, we can use the single bracket notation. We can also pass a vector of positions to select multiple columns. Since a data frame is a super powered matrix, R also let's us use matrix selection notation.


2 Answers

The data.table solution:

library(data.table)
dbsnp <- structure(list(snp = c("rs5", "rs6", "rs7", "rs8", "rs9", "rs10"
), gene = c("KRIT1", "CYP51A1", "LOC401387", "CDK6", "CDK6", 
"CDK6"), distance = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("snp", 
"gene", "distance"), class = "data.frame", row.names = c("rs5", 
"rs6", "rs7", "rs8", "rs9", "rs10"))

DT <- data.table(dbsnp, key='snp')
features <- c('rs5', 'rs7', 'rs9')
DT[features]

   snp      gene distance
1: rs5     KRIT1        1
2: rs7 LOC401387        1
3: rs9      CDK6        1
like image 186
Justin Avatar answered Sep 27 '22 19:09

Justin


Using sqldf you will need rownames = TRUE then you can query on the rownames using row_names:

library(sqldf)

## input

test<-read.table(header=T,text="      snp      gene distance
rs5   rs5     KRIT1        1
rs6   rs6   CYP51A1        1
rs7   rs7 LOC401387        1
rs8   rs8      CDK6        1
rs9   rs9      CDK6        1
rs10 rs10      CDK6        1
")
features<-c("rs5","rs7","rs10")

## calculate

inVar <- toString(shQuote(features, type = "csh")) # 'rs5','rs7','rs10'

fn$sqldf("SELECT * FROM test t
          WHERE t.row_names IN ($inVar)"
           , row.names = TRUE)

## result
#      snp      gene distance
#rs5   rs5     KRIT1        1
#rs7   rs7 LOC401387        1
#rs10 rs10      CDK6        1

UPDATE: Alternately if fet is a data frame whose features column contains the required items to find:

fet <- data.frame(features)
sqldf("SELECT t.* FROM test t
          WHERE t.row_names IN (SELECT features FROM fet)"
           , row.names = TRUE)

Also if the data were sufficiently large we could speed it up using indexes. See the sqldf home page for this and other details.

like image 25
shhhhimhuntingrabbits Avatar answered Sep 27 '22 17:09

shhhhimhuntingrabbits