Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas read_sql returns generator object

Tags:

python

sql

pandas

I am pulling data from Oracle db using pyodbc and pandas read_sql.

I see no errors when I enter this line

df = pd.read_sql(sql_str,cnxn,chunksize=10)

But when I try to see

df

I get this error

<generator object _query_iterator at 0x092D40F8>

My search as to what this error means or what could be causing it yielded no satisfactory answers.

The reason for using chunksize is that I have a Oracle db table with 60 million rows, and plan to download in bits and then put them together, just like here: How to create a large pandas dataframe from an sql query without running out of memory?

like image 367
DavidH Avatar asked Jan 09 '23 01:01

DavidH


2 Answers

As the explanation of chunksize says, when specified, it returns an iterator where chunksize is the number of rows to include in each chunk.
So you can iterate through the result and do something with each chunk:

for chunk in pd.read_sql_query(sql_str, engine, chunksize=10):
    do_something_with(chunk)

Typically you can process the chunk and add it to a list, and then after this for loop concat all processed chunks in this list together.

Also see the docs on sql querying: http://pandas.pydata.org/pandas-docs/stable/io.html#querying for an example.

like image 85
joris Avatar answered Jan 10 '23 16:01

joris


I would prefer to comment but cannot yet. Regardless, that is not an error that is telling you that df is a generator object.

like image 33
gffbss Avatar answered Jan 10 '23 15:01

gffbss