Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UNIX Importing LARGE csv into SQLite

Tags:

sqlite

r

csv

sas

I have a 5gig csv file (also as a sas datafile, if it would be easier) which I need to put into a sql database so I can work with it in R.

The variables names are all contained in the first observation line and are double quoted. There are 1000+ variables some of numeric others character (though some of the character variables are strings of numerals, but I'm not too worried about it I can fix it in R).

My question is how can I import the csv file into a new table in my database with minimal pain?

I've found things saying to create your table first (which includes specifying all the variables, of which I have 1000+) and then using ".import file table" to bring in the data. Or, to use some gui import wizard, which is not an option for me.

Sorry if this is sql 101 but thanks for the help.

like image 321
MaDMaD Mad Avatar asked Dec 03 '10 21:12

MaDMaD Mad


2 Answers

Here's my workflow:

library("RSQLite")
setwd("~/your/dir")
db <- dbConnect(SQLite(), dbname="your_db.sqlite") ## will make, if not present
field.types <- list(
        date="INTEGER",
        symbol="TEXT",
        permno="INTEGER",
        shrcd="INTEGER",
        prc="REAL",
        ret="REAL")
dbWriteTable(conn=db, name="your_table", value="your_file.csv", row.names=FALSE, header=TRUE, field.types=field.types)
dbGetQuery(db, "CREATE INDEX IF NOT EXISTS idx_your_table_date_sym ON crsp (date, symbol)")
dbDisconnect(db)

The field.types isn't necessary. RSQLite will guess from the header if you don't provide this list. The index isn't required either, but will speed up your queries later on (if you index the correct column for your queries).

I've been learning a lot of this stuff here on SO, so if you check my questions asked/answered on SQLite, you may find some tagential stuff.

like image 96
Richard Herron Avatar answered Sep 25 '22 14:09

Richard Herron


Look at the "read.csv.sql" function in the sqldf package.

This converts a csv file into an SQLite database, then reads it into R, you should be able to keep the intermediate database to use for your purposes.

like image 45
Greg Snow Avatar answered Sep 23 '22 14:09

Greg Snow