I am parsing a log and inserting it into either MySQL or SQLite using SQLAlchemy and Python. Right now I open a connection to the DB, and as I loop over each line, I insert it after it is parsed (This is just one big table right now, not very experienced with SQL). I then close the connection when the loop is done. The summarized code is:
log_table = schema.Table('log_table', metadata,
schema.Column('id', types.Integer, primary_key=True),
schema.Column('time', types.DateTime),
schema.Column('ip', types.String(length=15))
....
engine = create_engine(...)
metadata.bind = engine
connection = engine.connect()
....
for line in file_to_parse:
m = line_regex.match(line)
if m:
fields = m.groupdict()
pythonified = pythoninfy_log(fields) #Turn them into ints, datatimes, etc
if use_sql:
ins = log_table.insert(values=pythonified)
connection.execute(ins)
parsed += 1
My two questions are:
The big thing you should try is putting a transaction around multiple inserts since it is the committing of the database to disk that really takes a long time. You'll need to decide the batching level, but a crude first attempt would be to wrap a transaction around the whole lot.
Without knowing the table engine (MyISAM? InnoDB?), schema, and indexes, it's hard to comment on specifics between the two databases you're using there.
However, when using MySQL like this, you will likely find that it is far faster to write your data out to a temporary text file and then use the LOAD DATA INFILE syntax to load it all into your database. It looks like you can call the execute method on your connection object to run the SQL necessary to do this.
Further, if you are dead set on adding things row by row, and you're recreating the table every time, you can verify key constraints in your program and add those constraints only after all rows have been inserted, saving the DB the time of doing constraints checks on every insert.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With