I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time. Right now my code looks something like this: <pre class="prettyprint"><code>c.execute('SELECT * FROM big_table') table = c.fetchall() for row in table: do_stuff_with_row </code></pre> This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?

<code>cursor.fetchall()</code> fetches all results into a list first. Instead, you can iterate over the cursor itself: <pre class="prettyprint"><code>c.execute('SELECT * FROM big_table') for row in c: # do_stuff_with_row </code></pre> This produces rows as needed, rather than load them all first.

Python3 - Is there a way to iterate row by row over a very large SQlite table without loading the entire table into local memory?

Tags:

python

sqlite

I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time.

Right now my code looks something like this:

c.execute('SELECT * FROM big_table')  table = c.fetchall() for row in table:     do_stuff_with_row

This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?

451

asked Apr 11 '15 20:04

Marlon Dyck

1 Answers

cursor.fetchall() fetches all results into a list first.

Instead, you can iterate over the cursor itself:

c.execute('SELECT * FROM big_table')  for row in c:     # do_stuff_with_row

This produces rows as needed, rather than load them all first.

answered Oct 01 '22 09:10

Martijn Pieters

Related questions
                            
                                Building lxml for Python 2.7 on Windows
                            
                                Only index needed: enumerate or (x)range?
                            
                                how to initialize time() object in python
                            
                                How can a shell function know if it is running within a virtualenv?
                            
                                Cache entry deserialization failed, entry ignored
                            
                                Iterating over arrays in Python 3
                            
                                Django: "TypeError: [] is not JSON serializable" Why?
                            
                                Reading binary data from stdin
                            
                                Python reverse-stride slicing
                            
                                How to check if the current time is in range in python?
                            
                                How to write python lambda with multiple lines? [duplicate]
                            
                                ImportError: No module named flask.ext.login
                            
                                drop_all() freezes in Flask with SQLAlchemy
                            
                                Pandas Select DataFrame columns using boolean
                            
                                Proxy awareness with pip
                            
                                Flask-Session extension vs default session
                            
                                Python & Pandas - Group by day and count for each day
                            
                                Matplotlib custom marker/symbol
                            
                                How do I get the return value when using Python exec on the code object of a function?
                            
                                Delete a key and value from an OrderedDict

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With