I am trying to extract part of my database using a character vector. To explain, I have used the mtcars data as below:
library(dplyr)
library(sqldf)
library(RSQLite)
df <- cbind(rownames(mtcars),mtcars)
colnames(df)[1] <- "CarName"
CarsToFind <- c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E")
db <- dbConnect(SQLite(), dbname = 'mtcars_db.sqlite3')
dbWriteTable(conn = db, name = 'mtcars_table', value = df, row.names = TRUE, header = TRUE)
I could find the section of the data frame that I am interested in using:
mini_df <- df[df$CarName %in% CarsToFind,]
but my real data is quite large and I would rather not extract the whole thing into a data frame. I am looking for something similar to :
sqldf("SELECT * FROM mtcars_table WHERE CarName IN CarsToFind")
but this gives me the error: "no such table: CarsToFind". I don't want to create the table 'CarsToFind' in the SQL databank, because I have many different queries that I want to perform on a once off basis. Is it possible to query the SQL using such a character vector?
The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.
If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.
First, establish a connection to the SQLite database by creating a Connection object. Next, create a Cursor object using the cursor method of the Connection object. Then, execute a SELECT statement. After that, call the fetchall() method of the cursor object to fetch the data.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
The query you actually want to execute on SQLite should look like this:
SELECT *
FROM mtcars
WHERE CarName IN ('Valiant', 'Merc 280', 'Lotus Europa', 'Volvo 142E')
So all you need to do is build this string in R:
CarsToFind <- c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E")
CarsToFind <- paste0("'", CarsToFind, "'", collapse=", ")
whereIn <- paste0("(", CarsToFind, ")")
query <- paste0("SELECT * FROM mtcars WHERE CarName IN ", whereIn)
sqldf(query)
How about instead of creating a character vector, create a dataframe
with only one column? This would work:
CarsToFind <- data.frame(lookup=c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E"))
sqldf("SELECT * FROM df WHERE CarName IN CarsToFind")
Also, this way you don't have to change/add anything on the SQL
side, you can still keep everything on the R
side.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With