I have the following code runs over a large set of data (2M). It eats up all my 4G mem before finishing.
for sample in session.query(CodeSample).yield_per(100):
for proj in projects:
if sample.filename.startswith(proj.abs_source):
sample.filename = "some other path"
session.add(sample)
Then I ran it though a reduced set of data and analyzed heap with heapy. get_rp() gave me the folloing hint
0: _ --- [-] 47821 (0x9163aec | 0x9165fec | 0x916d6cc | 0x9251414 | 0x925704...
1: a [-] 8244 tuple: 0x903ec8c*37, 0x903fcfc*13, 0x9052ecc*46...
2: aa ---- [S] 3446 types.CodeType: parseresult.py:73:src_path...
3: ab [S] 364 type: __builtin__.Struct, _random.Random, sqlite3.Cache...
4: ac ---- [-] 90 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
5: aca [S] 11 dict of module: ..sql..., codemodel, sqlalchemy
6: acb ---- [-] 48 sqlalchemy.sql.visitors.VisitableType: 0x9162f2c...
7: acba [S] 9 dict of module: ..sql..., codemodel, sqlalchemy
8: acbb ---- [-] 45 sqlalchemy.sql.visitors.VisitableType: 0x9165fec...
9: acbba [S] 8 dict of module: ..sql..., codemodel, sqlalchemy
I'm new to sqlalchemy. Is this a memory leak? Thanks.
Most DBAPIs, including psycopg2 and mysql-python, fully load all results into memory before releasing them to the client. SQLA's yield_per() option doesn't work around this, with one exception below, which is why its generally not a very useful option(edit: useful in the sense that it begins streaming results before the actual rows are fully fetched).
The exceptions to this behavior are:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With