Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading huge csv files into R with sqldf works but sqlite file takes twice the space it should and needs "vacuuming"

Tags:

sqlite

r

sqldf

Reading around, I found out that the best way to read a larger-than-memory csv file is to use read.csv.sql from package sqldf. This function will read the data directly into a sqlite database, and consequently execute a sql statement.

I noticed the following: it seems that the data read into sqlite is stored into a temporary table, so that in order to make it accessible for future use, it needs to be asked so in the sql statement.

As a example, the following code reads some sample data into sqlite:

# generate sample data
sample_data <- data.frame(col1 = sample(letters, 100000, TRUE), col2 = rnorm(100000))
# save as csv
write.csv(sample_data, "sample_data.csv", row.names = FALSE)
# create a sample sqlite database
library(sqldf)
sqldf("attach sample_db as new") 
# read the csv into the database and create a table with its content
read.csv.sql("sample_data.csv", sql = "create table data as select * from file", 
             dbname = "sample_db", header = T, row.names = F, sep = ",")

The data can then be accessed with sqldf("select * from data limit 5", dbname = "sample_db").

The problem is the following: the sqlite file takes up twice as much space as it should. My guess is that it contains the data twice: once for the temporary read, and once for the stored table. It is possible to clean up the database with sqldf("vacuum", dbname = "sample_db"). This will reclaim the empty space, but it takes a long time, especially when the file is big.

Is there a better solution to this that doesn't create this data duplication in the first time ?

like image 838
nassimhddd Avatar asked Sep 12 '12 15:09

nassimhddd


People also ask

Is SQLite better than CSV?

Unless you're doing something very trivial to the CSV, and only doing it once, SQLite will be faster for runtime, coding time, and maintenance time, and it will be more flexible. The major advantages of putting the CSV into SQLite are... Query with a known query language. Query with a flexible query language.

Can you import CSV into SQLite?

Importing files as CSV or other formats Use the ". import" command to import CSV (comma separated value) or similarly delimited data into an SQLite table. The ". import" command takes two arguments which are the source from which data is to be read and the name of the ...

Which of the following command is used to export data to CSV in SQLite?

once command is used to export data to a CSV file followed by the file path/name where you want to write the file.


1 Answers

Solution: using RSQLite without going through sqldf:

library(RSQLite)
con <- dbConnect("SQLite", dbname = "sample_db")
# read csv file into sql database
dbWriteTable(con, name="sample_data", value="sample_data.csv", 
             row.names=FALSE, header=TRUE, sep = ",")
like image 58
nassimhddd Avatar answered Oct 04 '22 21:10

nassimhddd