Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster Insertion of Records into a Table with SQLAlchemy

I am parsing a log and inserting it into either MySQL or SQLite using SQLAlchemy and Python. Right now I open a connection to the DB, and as I loop over each line, I insert it after it is parsed (This is just one big table right now, not very experienced with SQL). I then close the connection when the loop is done. The summarized code is:

log_table = schema.Table('log_table', metadata,
                         schema.Column('id', types.Integer, primary_key=True),
                         schema.Column('time', types.DateTime),
                         schema.Column('ip', types.String(length=15))
....
engine = create_engine(...)
metadata.bind = engine
connection = engine.connect()
....
for line in file_to_parse:
    m = line_regex.match(line)
    if m:
        fields = m.groupdict()
        pythonified = pythoninfy_log(fields) #Turn them into ints, datatimes, etc
        if use_sql:
            ins = log_table.insert(values=pythonified)
            connection.execute(ins)
            parsed += 1

My two questions are:

  • Is there a way to speed up the inserts within this basic framework? Maybe have a Queue of inserts and some insertion threads, some sort of bulk inserts, etc?
  • When I used MySQL, for about ~1.2 million records the insert time was 15 minutes. With SQLite, the insert time was a little over an hour. Does that time difference between the db engines seem about right, or does it mean I am doing something very wrong?
like image 903
Kyle Brandt Avatar asked May 21 '10 12:05

Kyle Brandt


2 Answers

The big thing you should try is putting a transaction around multiple inserts since it is the committing of the database to disk that really takes a long time. You'll need to decide the batching level, but a crude first attempt would be to wrap a transaction around the whole lot.

like image 187
Donal Fellows Avatar answered Nov 14 '22 23:11

Donal Fellows


Without knowing the table engine (MyISAM? InnoDB?), schema, and indexes, it's hard to comment on specifics between the two databases you're using there.

However, when using MySQL like this, you will likely find that it is far faster to write your data out to a temporary text file and then use the LOAD DATA INFILE syntax to load it all into your database. It looks like you can call the execute method on your connection object to run the SQL necessary to do this.

Further, if you are dead set on adding things row by row, and you're recreating the table every time, you can verify key constraints in your program and add those constraints only after all rows have been inserted, saving the DB the time of doing constraints checks on every insert.

like image 3
Conspicuous Compiler Avatar answered Nov 14 '22 23:11

Conspicuous Compiler