Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep only specific rows from a dataframe in Python?

Tags:

python

pandas

I have a DataFrame with 200 indices. I want to delete all the rows belonging to other indices except those belonging to certain indices like 128, 133, 140, 143, 199.

Previously, I dropped all the rows belonging to the indices 128, 133, 140, 143, 199, and it had worked fine. My code was

dataset_drop = dataset.drop(index = [128, 133, 140, 143, 199])

Now, I am trying to do the other way round. I want to keep the rows belonging to the indices 128, 133, 140, 143, 199 and delete the others.

What I tried doing:

dropped_data = dataset.drop(index != [128, 133, 140, 143, 199])

When I do this, I get an error saying

NameError: name 'index' is not defined

Can anyone tell what is it that I am doing wrong?


1 Answers

To explain the reason for your exception, the expression

index != [128, 133, 140, 143, 199]

Is evaluated as a conditional expression, rather than treating index as a keyword argument. Python searches for the variable index to compare against the list. Since index is not defined, you see a NameError.


Use Index.difference to fix your drop solution:

dataset.drop(index=df.index.difference([128, 133, 140, 143, 199]))

Or, even more idiomatically, you should use loc to select if you have positive labels.

dataset.loc[[128, 133, 140, 143, 199]]
# If they are indexes,
# dataset.iloc[[128, 133, 140, 143, 199]]
like image 144
cs95 Avatar answered Dec 05 '25 05:12

cs95