Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3 - Is there a way to iterate row by row over a very large SQlite table without loading the entire table into local memory?

Tags:

python

sqlite

I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time.

Right now my code looks something like this:

c.execute('SELECT * FROM big_table')  table = c.fetchall() for row in table:     do_stuff_with_row 

This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?

like image 451
Marlon Dyck Avatar asked Apr 11 '15 20:04

Marlon Dyck


People also ask

How fetch data from sqlite3 database in Python?

SQLite Python: Querying Data First, establish a connection to the SQLite database by creating a Connection object. Next, create a Cursor object using the cursor method of the Connection object. Then, execute a SELECT statement. After that, call the fetchall() method of the cursor object to fetch the data.

How many rows of data can SQLite handle?

The theoretical maximum number of rows in a table is 264 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 281 terabytes will be reached first.

What is the difference between SQLite and SQLAlchemy?

Sqlite is a database storage engine, which can be better compared with things such as MySQL, PostgreSQL, Oracle, MSSQL, etc. It is used to store and retrieve structured data from files. SQLAlchemy is a Python library that provides an object relational mapper (ORM).


1 Answers

cursor.fetchall() fetches all results into a list first.

Instead, you can iterate over the cursor itself:

c.execute('SELECT * FROM big_table')  for row in c:     # do_stuff_with_row 

This produces rows as needed, rather than load them all first.

like image 97
Martijn Pieters Avatar answered Oct 01 '22 09:10

Martijn Pieters