SQLAlchemy: Scan huge tables using ORM?

Tags:

I am currently playing around with SQLAlchemy a bit, which is really quite neat.

For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...

For fun I did the equivalent of a select * over the resulting SQLite database:

session = Session() for p in session.query(Picture):     print(p)

I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.

Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?

419

asked Jul 17 '09 22:07

Bluehorn

1 Answers

Okay, I just found a way to do this myself. Changing the code to

session = Session() for p in session.query(Picture).yield_per(5):     print(p)

loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs

WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.

So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?

answered Sep 30 '22 19:09

Bluehorn

Related questions
                            
                                understanding numpy's dstack function
                            
                                How I can get rid of None values in dictionary?
                            
                                How can I check if an object is an iterator in Python?
                            
                                How can I convert os.path.getctime()?
                            
                                TensorFlow: training on my own image
                            
                                How does a threading.Thread yield the rest of its quantum in Python?
                            
                                How to create tzinfo when I have UTC offset?
                            
                                How to pad with zeros a tensor along some axis (Python)
                            
                                How to add custom css file to Sphinx?
                            
                                How to limit log file size in python
                            
                                Matplotlib figure to image as a numpy array
                            
                                spark 2.1.0 session config settings (pyspark)
                            
                                Python/pyspark data frame rearrange columns
                            
                                ValueError: Dependency on app with no migrations: customuser
                            
                                How can one display an image using cv2 in Python
                            
                                Python SqlAlchemy order_by DateTime?
                            
                                How to save the Pandas dataframe/series data as a figure?
                            
                                Recursive unittest discover
                            
                                Python BeautifulSoup extract text between element
                            
                                graph.write_pdf("iris.pdf") AttributeError: 'list' object has no attribute 'write_pdf'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQLAlchemy: Scan huge tables using ORM?

Tags:

performance

python

orm

sqlalchemy

Bluehorn

People also ask

1 Answers

Bluehorn

Recent Activity

Donate For Us