Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting between shelve and sqlite for really large dictionary (Python)

I have a large Python dictionary of vectors (150k vectors, 10k dimensions each) of float numbers that can't be loaded into memory, so I have to use one of the two methods for storing this on disk and retrieving specific vectors when appropriate. The vectors will be created and stored once, but might be read many (thousands of) times -- so it is really important to have efficient reading. After some tests with shelve module, I tend to believe that sqlite will be a better option for this kind of task, but before I start writing code I would like to hear some more opinions on this... For example, are there any other options except of those two that I'm not aware of?

Now, assuming we agree that the best option is sqlite, another question relates to the exact form of the table. I'm thinking of using a fine-grained structure with rows of the form vector_key, element_no, value to help efficient pagination, instead of storing all 10k elements of a vector into the same record. I would really appreciate any suggestions on this issue.

like image 834
dkar Avatar asked Jun 05 '12 11:06

dkar


People also ask

How to use SELECT statement in Python SQLite?

In this article, we will discuss, select statement of the Python SQLite module. This statement is used to retrieve data from an SQLite table and this returns the data contained in the table. In SQLite the syntax of Select Statement is: SELECT * FROM table_name;

How to fetch many records from a table in SQLite?

where, cursor is an object of sqlite3 connection with database. Now we will use the Select statement to retrieve data from the table and fetch many records not all. To fetch many records we will use fetchmany () method. where, cursor is an object of sqlite3 connection with database.

What is a “shelf” in Python?

A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle. This includes most class instances, recursive data types, and objects containing lots of shared sub-objects.

How to import a shelve object in SQL?

import shelve myShelve = shelve.open ('my.shelve') myShelve.update (bigd) myShelve.close () You basically treat the shelve object like a dict, but the items are stored on disk (as individual pickles) and read in as needed. If your objects could be stored as a list of properties, then sqlite may be a good alternative.


1 Answers

You want sqlite3, then if you use an ORM like sqlalchemy then you can easily grow to expand and use other back end databases.

Shelve is more of a "toy" than actually useful in production code.

The other point you are talking about is called normalization and I have personally never been very good at it this should explain it for you.

Just as an extra note this shows performance failures in shelve vs sqlite3

like image 164
Jakob Bowyer Avatar answered Oct 15 '22 09:10

Jakob Bowyer