Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python is slow when iterating over a large list

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

Some more information:

  • The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
  • The total rows selected is ~489,000.
  • The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.

Can anyone tell me whats wrong? Do scripts just run this slow?

Thanks

like image 549
nycynik Avatar asked Feb 22 '12 19:02

nycynik


3 Answers

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

And no, scripts shouldn't be that slow.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1
like image 145
Pablo Santa Cruz Avatar answered Oct 06 '22 20:10

Pablo Santa Cruz


This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

like image 44
jsbueno Avatar answered Oct 06 '22 18:10

jsbueno


More investigation is needed here... consider the following script:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

like image 28
jkerian Avatar answered Oct 06 '22 19:10

jkerian