Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does sqlite3 compress data?

Tags:

python

sqlite

I've got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.

The csv has around 150,000,000 rows. It has header:

tkey,ipaddr,healthtime,numconnections,policystatus,activityflag 

And each row looks something like

261846,172.10.28.15,2012-02-03 16:15:00,22,1,1 

The script uses healthtime to split the data into tables 192 tables

When I first saw these numbers, I assumed I had made an error someplace. How big of a reduction in file size should I expect from the added efficiency of only writing the healthtime 192 times instead of 150,000,000 times?

EDIT: As soon as I posted this I realized the answer. I'm removing about 40% of the string, hence the 40% reduction in size.

Edit 2 Let's calculate the difference in size between the plain text:

"261846,172.10.28.15,2012-02-03 16:15:00,22,1,1"  

And the data base entry:

db(261846,'172.10.28.15',22,1,1) 

First of all, we drop from 46 to 26 characters in plain text representation.

The remaining characters are:

"261846,172.10.28.15,22,1,1" 

or 26 bytes. If each integer needs to be stored in 32 bit (4 bytes), then we have:

12 bytes (ipaddr) + 4 bytes * 4 (integer fields) = 28 bytes.

So it looks like converting to integers makes the storage slightly less efficient, and all of my gains come form reducing the number of characters stored in each row.

like image 431
Maus Avatar asked May 30 '12 21:05

Maus


People also ask

Are SQLite files compressed?

The SQLite Compressed and Encrypted Read-Only Database (CEROD) Extension. The SQLite Compressed and Encrypted Read-Only Database (CEROD) Extension is an add-on to the public domain version of SQLite that allows an application to read compressed and encrypted database files in addition to ordinary SQLite database files.

Is SQLite good for large data?

Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start. There is absolutely no issue with storing 100 GB of data.

What are the disadvantages of SQLite?

SQLite Disadvantages SQLite is used to handle low to medium traffic HTTP requests. Database size is restricted to 2GB in most cases.

How much data is too much for SQLite?

SQLite database files have a maximum size of about 140 TB. On a phone, the size of the storage (a few GB) will limit your database file size, while the memory size will limit how much data you can retrieve from a query. Furthermore, Android cursors have a limit of 1 MB for the results.


1 Answers

SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

Here are more details on the SQL Database File Format if you are interested.

Does that make sense?

like image 122
Justin Ethier Avatar answered Oct 06 '22 14:10

Justin Ethier