Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqlite database design with millions of 'url' strings - slow bulk import from csv

Tags:

search

sqlite

I'm trying to create an sqlite database by importing a csv file with urls. The file has about 6 million strings. Here are the commands I've used

create table urltable (url text primary key);
.import csvfile urldatabase

After about 3 million urls the speed slows down a lot and my hard disk keeps spinning continuously. I've tried splitting the csv file into 1/4th chunks but I run into the same problem.

I read similar posts on stackoverflow and tried using BEGIN...COMMIT blocks and PRAGMA synchronous=OFF but none of them helped. The only way I was able to create the database was by removing the primary key constraint from url. But then, when I run a select command to find a particular url, it takes 2-3 seconds which won't work for my application. With the primary key set on url, the select is instantaneous. Please advise me what am I doing wrong.

[Edit] Summary of suggestions that helped :

  • Reduce the number of transactions
  • Increase page size & cache size
  • Add the index later
  • Remove redundancy from url

Still, with a primary index, the database size is more than double the original csv file that I was trying to import. Any way to reduce that?

like image 976
user Avatar asked Feb 23 '23 23:02

user


1 Answers

Increase your cache size to something large enough to contain all of the data in memory. The default values for page size and cache size are relatively small and if this is a desktop application then you can easily increase the cache size many times.

PRAGMA page_size = 4096;
PRAGMA cache_size = 72500;

Will give you a cache size of just under 300mb. Remember page size must be set before the database is created. The default page size is 1024 and default cache size is 2000.

Alternatively (or almost equivalently really) you can create the database entirely in an in-memory database and then use the backup API to move it to an on-disk database.

like image 128
Samuel Neff Avatar answered May 13 '23 03:05

Samuel Neff