I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:
import pyodbc
connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x')
cursor=connection.cursor()
connection.autocommit=True
while 1:
cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6)
mdbconn.close()
Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?
I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145
Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.
Even I had faced the same problem.
I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.
I tried the following :
Using the same cursor by dereferencing.
Closing /opening the connection
Setting the connection to None.
Finally ended up bootstrapping each XML file load using Process module.
Now I have replaced the process using IronPython - System.Data.SqlClient.
This give a better performance and also better interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With