Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement database-style table in Python

I am implementing a class that resembles a typical database table:

  • has named columns and unnamed rows
  • has a primary key by which I can refer to the rows
  • supports retrieval and assignment by primary key and column title
  • can be asked to add unique or non-unique index for any of the columns, allowing fast retrieval of a row (or set of rows) which have a given value in that column
  • removal of a row is fast and is implemented as "soft-delete": the row is kept physically, but is marked for deletion and won't show up in any subsequent retrieval operations
  • addition of a column is fast
  • rows are rarely added
  • columns are rarely deleted

I decided to implement the class directly rather than use a wrapper around sqlite.

What would be a good data structure to use?


Just as an example, one approach I was thinking about is a dictionary. Its keys are the values in the primary key column of the table; its values are the rows implemented in one of these ways:

  1. As lists. Column numbers are mapped into column titles (using a list for one direction and a map for the other). Here, a retrieval operation would first convert column title into column number, and then find the corresponding element in the list.

  2. As dictionaries. Column titles are the keys of this dictionary.

Not sure about the pros/cons of the two.


The reasons I want to write my own code are:

  • I need to track row deletions. That is, at any time I want to be able to report which rows where deleted and for what "reason" (the "reason" is passed to my delete method).
  • I need some reporting during indexing (e.g., while an non-unique index is being built, I want to check certain conditions and report if they are violated)
like image 358
max Avatar asked Nov 15 '10 19:11

max


1 Answers

You might want to consider creating a class which uses an in-memory sqlite table under the hood:

import sqlite3

class MyTable(object):
    def __init__(self):
        self.conn=sqlite3.connect(':memory:')
        self.cursor=self.conn.cursor()
        sql='''\
            CREATE TABLE foo ...
        '''
        self.execute(sql)
    def execute(self,sql,args):
        self.cursor.execute(sql,args)
    def delete(self,id,reason):
        sql='UPDATE table SET softdelete = 1, reason = %s where tableid = %s'
        self.cursor.execute(sql,(reason,id,))
    def verify(self):
        # Check that certain conditions are true
        # Report (or raise exception?) if violated
    def build_index(self):
        self.verify()
        ... 

Soft-delete can be implemented by having a softdelete column (of bool type). Similarly, you can have a column to store reason for deletion. Undeleting would simply involve updating the row and changing the softdelete value. Selecting rows that have not been deleted could be achieved with the SQL condition WHERE softdelete != 1.

You could write a verify method to verify conditions on your data are satisfied. And you could call that method from within your build_index method.

Another alternative is to use a numpy structured masked array.

It's hard to say what would be fastest. Perhaps the only sure way to tell would be to write code for each and benchmark on real-world data with timeit.

like image 187
unutbu Avatar answered Sep 29 '22 11:09

unutbu