I have a 5gig csv file (also as a sas datafile, if it would be easier) which I need to put into a sql database so I can work with it in R.
The variables names are all contained in the first observation line and are double quoted. There are 1000+ variables some of numeric others character (though some of the character variables are strings of numerals, but I'm not too worried about it I can fix it in R).
My question is how can I import the csv file into a new table in my database with minimal pain?
I've found things saying to create your table first (which includes specifying all the variables, of which I have 1000+) and then using ".import file table" to bring in the data. Or, to use some gui import wizard, which is not an option for me.
Sorry if this is sql 101 but thanks for the help.
Here's my workflow:
library("RSQLite")
setwd("~/your/dir")
db <- dbConnect(SQLite(), dbname="your_db.sqlite") ## will make, if not present
field.types <- list(
date="INTEGER",
symbol="TEXT",
permno="INTEGER",
shrcd="INTEGER",
prc="REAL",
ret="REAL")
dbWriteTable(conn=db, name="your_table", value="your_file.csv", row.names=FALSE, header=TRUE, field.types=field.types)
dbGetQuery(db, "CREATE INDEX IF NOT EXISTS idx_your_table_date_sym ON crsp (date, symbol)")
dbDisconnect(db)
The field.types
isn't necessary. RSQLite will guess from the header if you don't provide this list. The index isn't required either, but will speed up your queries later on (if you index the correct column for your queries).
I've been learning a lot of this stuff here on SO, so if you check my questions asked/answered on SQLite, you may find some tagential stuff.
Look at the "read.csv.sql" function in the sqldf package.
This converts a csv file into an SQLite database, then reads it into R, you should be able to keep the intermediate database to use for your purposes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With