Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient pandas python command to drop Nan rows?

Tags:

python

pandas

nan

I have a DF called TI. I want to drop rows where BookDate is NaN. So I run:

TI = TI.dropna(subset=['#Book_Date'])

When I run this, the memory gets eaten up for some reason (I'm on a 100GB of RAM machine, and about 50% of the RAM is used to hold TI, when I run that dropna line, it goes to 100% usage and never finished executing the commmand). Is it making a whole new copy? TI is a 64 million row dataframe, so it needs to be more efficient.

like image 228
wolfsatthedoor Avatar asked Mar 04 '26 11:03

wolfsatthedoor


1 Answers

By far the best way to do this is to do this is through employment that the column must be finite. You'll need numpy for this.

from pandas import *
import numpy

TI = TI[np.isfinite(TI['#Book_Date'])]
like image 67
PhysicalChemist Avatar answered Mar 06 '26 03:03

PhysicalChemist



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!