I have a DF called TI. I want to drop rows where BookDate is NaN. So I run:
TI = TI.dropna(subset=['#Book_Date'])
When I run this, the memory gets eaten up for some reason (I'm on a 100GB of RAM machine, and about 50% of the RAM is used to hold TI, when I run that dropna line, it goes to 100% usage and never finished executing the commmand). Is it making a whole new copy? TI is a 64 million row dataframe, so it needs to be more efficient.
By far the best way to do this is to do this is through employment that the column must be finite. You'll need numpy for this.
from pandas import *
import numpy
TI = TI[np.isfinite(TI['#Book_Date'])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With