Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory-efficient filtering of `DataFrame` rows

I have a large DataFrame object (1,440,000,000 rows). I operate at memory (swap includet) limit.

I need to extract a subset of the rows with certain value of a field. However if i do like that:

>>> SUBSET = DATA[DATA.field == value]

I end with either MemoryError exception or crash. Is there any way to filter rows explicitely - without calculating intermediate mask (DATA.field == value)?

I have found DataFrame.filter() and DataFrame.select() methods, but they operate on column labels/row indices rather than on the row data.

like image 841
abukaj Avatar asked Oct 29 '25 03:10

abukaj


1 Answers

Use query, it should be a bit faster:

df = df.query("field == value")
like image 174
jezrael Avatar answered Oct 31 '25 18:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!