Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove data from DataFrame permanently

After reading CSV data file with:

import pandas as pd  
df = pd.read_csv('data.csv')
print df.shape

I get DataFrame 99 rows (indexes) long:

(99, 2)

To cleanup DataFrame I go ahead and apply dropna() method which reduces it to 33 rows:

df = df.dropna()
print df.shape

which prints:

(33, 2)

Now when I iterate the columns it prints out all 99 rows like they weren't dropped:

for index, value in df['column1'].iteritems():
    print index

which gives me this:

0
1
2
.
.
.
97
98
99

It appears the dropna() simply made the data "hidden". That hidden data returns back when I iterate DataFrame. How to assure the dropped data is removed from DataFrame instead just getting hidden?

like image 317
alphanumeric Avatar asked Mar 05 '26 10:03

alphanumeric


1 Answers

You're being confused by the fact that the row labels have been preserved so the last row label is still 99.

Example:

In [2]:
df = pd.DataFrame({'a':[0,1,np.NaN, np.NaN, 4]})
df

Out[2]:
    a
0   0
1   1
2 NaN
3 NaN
4   4

After calling dropna the index row labels are preserved:

In [3]:
df = df.dropna()
df

Out[3]:
   a
0  0
1  1
4  4

If you want to reset so that they are contiguous then call reset_index(drop=True) to assign a new index:

In [4]:
df = df.reset_index(drop=True)
df

Out[4]:
   a
0  0
1  1
2  4
like image 63
EdChum Avatar answered Mar 07 '26 22:03

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!