After reading CSV data file with:
import pandas as pd
df = pd.read_csv('data.csv')
print df.shape
I get DataFrame 99 rows (indexes) long:
(99, 2)
To cleanup DataFrame I go ahead and apply dropna() method which reduces it to 33 rows:
df = df.dropna()
print df.shape
which prints:
(33, 2)
Now when I iterate the columns it prints out all 99 rows like they weren't dropped:
for index, value in df['column1'].iteritems():
print index
which gives me this:
0
1
2
.
.
.
97
98
99
It appears the dropna() simply made the data "hidden". That hidden data returns back when I iterate DataFrame. How to assure the dropped data is removed from DataFrame instead just getting hidden?
You're being confused by the fact that the row labels have been preserved so the last row label is still 99.
Example:
In [2]:
df = pd.DataFrame({'a':[0,1,np.NaN, np.NaN, 4]})
df
Out[2]:
a
0 0
1 1
2 NaN
3 NaN
4 4
After calling dropna the index row labels are preserved:
In [3]:
df = df.dropna()
df
Out[3]:
a
0 0
1 1
4 4
If you want to reset so that they are contiguous then call reset_index(drop=True) to assign a new index:
In [4]:
df = df.reset_index(drop=True)
df
Out[4]:
a
0 0
1 1
2 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With