How to remove data from DataFrame permanently

Question

After reading CSV data file with:

import pandas as pd  
df = pd.read_csv('data.csv')
print df.shape

I get DataFrame 99 rows (indexes) long:

(99, 2)

To cleanup DataFrame I go ahead and apply dropna() method which reduces it to 33 rows:

df = df.dropna()
print df.shape

which prints:

(33, 2)

Now when I iterate the columns it prints out all 99 rows like they weren't dropped:

for index, value in df['column1'].iteritems():
    print index

which gives me this:

It appears the dropna() simply made the data "hidden". That hidden data returns back when I iterate DataFrame. How to assure the dropped data is removed from DataFrame instead just getting hidden?

EdChum · Accepted Answer

You're being confused by the fact that the row labels have been preserved so the last row label is still 99.

Example:

In [2]:
df = pd.DataFrame({'a':[0,1,np.NaN, np.NaN, 4]})
df

Out[2]:
    a
0   0
1   1
2 NaN
3 NaN
4   4

After calling dropna the index row labels are preserved:

In [3]:
df = df.dropna()
df

Out[3]:
   a
0  0
1  1
4  4

If you want to reset so that they are contiguous then call reset_index(drop=True) to assign a new index:

In [4]:
df = df.reset_index(drop=True)
df

Out[4]:
   a
0  0
1  1
2  4

How to remove data from DataFrame permanently

Tags:

python

pandas

dataframe

alphanumeric

1 Answers

EdChum

Recent Activity

Donate For Us

How to remove data from DataFrame permanently

Tags:

python

pandas

dataframe

alphanumeric

1 Answers

EdChum

Related questions

Recent Activity

Donate For Us