Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there a large insert performance difference between python SqlAlchemy Boolean and Integer Type

Storing the same value in a sqlite database as a Boolean or Integer using Python and Sqlalchemy produces the following results.

Value stored as Boolean:
SqlAlchemy ORM: Total time for 40000 records 62.5009999275 secs
SqlAlchemy Core: Total time for 40000 records 56.0600001812 secs
Value stored as Integer:
SqlAlchemy ORM: Total time for 40000 records 5.72099995613 secs
SqlAlchemy Core: Total time for 40000 records 0.770999908447 secs

Why is there such a performance issue when using the Boolean type?

I know SQLite doesn't have the concept of boolean type and instead stores them as an integer 1 (True) or 0 (False). I would have presumed that SqlAlchemy would have just mapped the python bool to Sqlite integer.

The script used to generate the output above (modified from this question):

import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine, Boolean
from sqlalchemy.orm import scoped_session, sessionmaker

Base = declarative_base()
DBSession = scoped_session(sessionmaker())

class CustomerInteger(Base):
    __tablename__ = "customerInteger"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    value = Column(Integer)

class CustomerBoolean(Base):
    __tablename__ = "customerBoolean"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    value = Column(Boolean)

def init_sqlalchemy(dbname = 'sqlite:///sqlalchemy.db'):
    global engine
    engine = create_engine(dbname, echo=False)
    DBSession.remove()
    DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)

def test_sqlalchemy_orm(n, table):
    init_sqlalchemy()
    t0 = time.time()
    for i in range(n):
        customer = table()
        customer.name = 'NAME ' + str(i)
        customer.value = True
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print "SqlAlchemy ORM: Total time for " + str(n) + " records " + str(time.time() - t0) + " secs"


def test_sqlalchemy_core(n, table):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        table.__table__.insert(),
        [{"name":'NAME ' + str(i), "value":True } for i in range(n)]
    )
    print "SqlAlchemy Core: Total time for " + str(n) + " records " + str(time.time() - t0) + " secs"


if __name__ == '__main__':

    print "Value stored as Boolean:"
    test_sqlalchemy_orm(40000, CustomerBoolean)
    test_sqlalchemy_core(40000, CustomerBoolean)

    print "Value stored as Integer:"
    test_sqlalchemy_orm(40000, CustomerInteger)
    test_sqlalchemy_core(40000, CustomerInteger)
like image 424
James Bingo Avatar asked Jun 28 '13 10:06

James Bingo


1 Answers

I've made a test on three configurations. While there is difference in run time between Boolean and Integer, but it is not 10 times. May be you want to try switching to another python version.

PS. I was running my tests on Core i5 M430 CPU machine with Windows 8.

Also I would suggest to run profiler to see where sqlalchemy is spending that much time while running on your system.

1)

python: 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)]
sqlalchemy: 0.7.8
Value stored as Boolean:
SqlAlchemy ORM: Total time for 40000 records 8.84400010109 secs
SqlAlchemy Core: Total time for 40000 records 0.725000143051 secs
Value stored as Integer:
SqlAlchemy ORM: Total time for 40000 records 8.0680000782 secs
SqlAlchemy Core: Total time for 40000 records 0.443000078201 secs

2)

python: 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]
sqlalchemy: 0.8.1
Value stored as Boolean:
SqlAlchemy ORM: Total time for 40000 records 9.69299983978 secs
SqlAlchemy Core: Total time for 40000 records 0.572000026703 secs
Value stored as Integer:
SqlAlchemy ORM: Total time for 40000 records 9.35899996758 secs
SqlAlchemy Core: Total time for 40000 records 0.40700006485 secs

3)

python: 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
sqlalchemy: 0.8.1
Value stored as Boolean:
SqlAlchemy ORM: Total time for 40000 records 8.531000137329102 secs
SqlAlchemy Core: Total time for 40000 records 0.7139999866485596 secs
Value stored as Integer:
SqlAlchemy ORM: Total time for 40000 records 8.023000001907349 secs
SqlAlchemy Core: Total time for 40000 records 0.44099998474121094 secs
like image 89
vvladymyrov Avatar answered Oct 07 '22 00:10

vvladymyrov