I have a large DataFrame object (1,440,000,000 rows). I operate at memory (swap includet) limit.
I need to extract a subset of the rows with certain value of a field. However if i do like that:
>>> SUBSET = DATA[DATA.field == value]
I end with either MemoryError exception or crash.
Is there any way to filter rows explicitely - without calculating intermediate mask (DATA.field == value)?
I have found DataFrame.filter() and DataFrame.select() methods, but they operate on column labels/row indices rather than on the row data.
Use query, it should be a bit faster:
df = df.query("field == value")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With