Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Factors in RSQLite

Tags:

sqlite

r

rsqlite

I have been unable to find documentation on how RSQLite handles factors. From a quick test (see below), it looks like they are converted to character.

Question 1: Is there any way to preserve them as factors? I can think of some kludgy ways (mostly involving a separate table or .Rdata file storing the factor levels), but it seems like there should be a standard and therefore more maintainable way of doing this.

Question 2: If not RSQLite, than some other database or database-like package? My use case here is simple: append in a bunch of large (2-5mm row X 550 column) data.frames as each is processed to build a giant database, then be able to select only the rows I want from that database to bring into a data.table and work on.

library(RSQLite)
# Create
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )
# Write test
set.seed(1)
testDat <- data.frame(x=runif(1000),y=runif(1000),g1=sample(letters[1:10],1000,replace=TRUE),g2=rep(letters[1:10],each=100),g3=factor( sample(letters[1:10],1000,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteTable( conn = db, name = "test", value = testDat, row.names=FALSE )
# Read test
testRecovery <- dbGetQuery(db, "SELECT * FROM test")
testSelection <- dbGetQuery(db, "SELECT * FROM test WHERE g3=='h' OR g3=='e' ")
# Close
dbDisconnect(db)
like image 897
Ari B. Friedman Avatar asked Feb 15 '23 19:02

Ari B. Friedman


1 Answers

It looks pretty simple to me: factor is a concept only S and R know. Full stop.

So to get them into a DB and back, you need to write mappers. Either be simplistic and do everything as.character (and assume most DB backend will hash strings just as R does). Or be DB-centric and split the factor into just the (unsigned) (and possibly short) integers, and the labels.

like image 194
Dirk Eddelbuettel Avatar answered Feb 23 '23 13:02

Dirk Eddelbuettel