edit: many thanks for all the answers. Here are the results after applying the optimisations so far:
The really nice part is this hasn't required any changes in the iphone code
I have an iphone application with a large dictionary held in sqlite format (read only). I'm looking for ideas to reduce the size of the DB file, which is currently very large.
Here is the number of entries and resulting size of the sqlite DB:
franks-macbook:DictionaryMaker frank$ ls -lh dictionary.db
-rw-r--r-- 1 frank staff 59M 8 Oct 23:08 dictionary.db
franks-macbook:DictionaryMaker frank$ wc -l dictionary.txt
453154 dictionary.txt
...an average of about 135 bytes per entry.
Here is my DB schema:
create table words (word text primary key, sowpods boolean, twl boolean, signature text)
create index sowpods_idx on words(sowpods)
create index twl_idx on words(twl)
create index signature_idx on words(signature)
Here is some sample data:
photoengrave|1|1|10002011000001210101010000
photoengraved|1|1|10012011000001210101010000
photoengraver|1|1|10002011000001210201010000
photoengravers|1|1|10002011000001210211010000
photoengraves|1|1|10002011000001210111010000
photoengraving|1|1|10001021100002210101010000
The last field represents the letter frequencies for anagram retrieval (each position is in the range 0..9). The two booleans represent sub dictionaries.
I need to do queries such as:
select signature from words where word = 'foo'
select word from words where signature = '10001021100002210101010000' order by word asc
select word from words where word like 'foo' order by word asc
select word from words where word = 'foo' and (sowpods='1' or twl='1')
One idea I have is to encode the letter frequencies more efficiently, e.g. binary encode them as a blob (perhaps with RLE as there are many zeros?). Any ideas for how best to achieve this, or other ideas to reduce the size? I am building the DB in ruby, and reading it on the phone in objective C.
Also is there any way to get stats on the DB so I can see what is using the most space?
VACUUM will reconstruct the database from scratch and keep the database minimal in size. Thus, it is a recommended good practice to execute VACUUM periodically to reduce the SQLite database file size.
Apple uses SQLite in many (most?) of the native applications running on Mac OS-X desktops and servers and on iOS devices such as iPhones and iPods. SQLite is also used in iTunes, even on non-Apple hardware.
The wiki tag description for both tags is the same: SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. The sqlite3 has no synonyms but sqlite has sqlitedatabase as a solitary synonym.
SQLite database files have a maximum size of about 140 TB. On a phone, the size of the storage (a few GB) will limit your database file size, while the memory size will limit how much data you can retrieve from a query. Furthermore, Android cursors have a limit of 1 MB for the results.
Have you tried typing the "vacuum" command to make sure you don't have extra space in the db you forgot to reclame?
Remove the indexes on sowpods and twl -- they are probably not helping your query times and are definitely taking lots of space.
You can get stats on the database using sqlite3_analyzer from the SQLite downloads page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With