Memory-efficient filtering of `DataFrame` rows

Question

I have a large DataFrame object (1,440,000,000 rows). I operate at memory (swap includet) limit.

I need to extract a subset of the rows with certain value of a field. However if i do like that:

>>> SUBSET = DATA[DATA.field == value]

I end with either MemoryError exception or crash. Is there any way to filter rows explicitely - without calculating intermediate mask (DATA.field == value)?

I have found DataFrame.filter() and DataFrame.select() methods, but they operate on column labels/row indices rather than on the row data.

jezrael · Accepted Answer

Use query, it should be a bit faster:

df = df.query("field == value")

Memory-efficient filtering of `DataFrame` rows

Tags:

python

python-3.x

pandas

python-2.7

abukaj

1 Answers

jezrael

Recent Activity

Donate For Us

Memory-efficient filtering of `DataFrame` rows

Tags:

python

python-3.x

pandas

python-2.7

abukaj

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us