I have been unable to find documentation on how RSQLite handles factors. From a quick test (see below), it looks like they are converted to character.
Question 1: Is there any way to preserve them as factors? I can think of some kludgy ways (mostly involving a separate table or .Rdata
file storing the factor levels), but it seems like there should be a standard and therefore more maintainable way of doing this.
Question 2: If not RSQLite, than some other database or database-like package? My use case here is simple: append in a bunch of large (2-5mm row X 550 column) data.frames as each is processed to build a giant database, then be able to select only the rows I want from that database to bring into a data.table
and work on.
library(RSQLite)
# Create
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )
# Write test
set.seed(1)
testDat <- data.frame(x=runif(1000),y=runif(1000),g1=sample(letters[1:10],1000,replace=TRUE),g2=rep(letters[1:10],each=100),g3=factor( sample(letters[1:10],1000,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteTable( conn = db, name = "test", value = testDat, row.names=FALSE )
# Read test
testRecovery <- dbGetQuery(db, "SELECT * FROM test")
testSelection <- dbGetQuery(db, "SELECT * FROM test WHERE g3=='h' OR g3=='e' ")
# Close
dbDisconnect(db)
It looks pretty simple to me: factor
is a concept only S and R know. Full stop.
So to get them into a DB and back, you need to write mappers. Either be simplistic and do everything as.character
(and assume most DB backend will hash strings just as R does). Or be DB-centric and split the factor into just the (unsigned) (and possibly short) integers, and the labels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With