Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete rows from a pandas DataFrame based on a conditional expression [duplicate]

Tags:

python

pandas

I have a pandas DataFrame and I want to delete rows from it where the length of the string in a particular column is greater than 2.

I expect to be able to do this (per this answer):

df[(len(df['column name']) < 2)] 

but I just get the error:

KeyError: u'no item named False' 

What am I doing wrong?

(Note: I know I can use df.dropna() to get rid of rows that contain any NaN, but I didn't see how to remove rows based on a conditional expression.)

like image 983
sjs Avatar asked Dec 13 '12 01:12

sjs


People also ask

How do I delete rows from a pandas Dataframe based on a conditional expression?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do I delete rows in pandas based on multiple conditions?

The Pandas dataframe drop() method takes single or list label names and delete corresponding rows and columns. The axis = 0 is for rows and axis =1 is for columns. In this example, we are deleting the row that 'mark' column has value =100 so three rows are satisfying the condition.

How do you delete a row based on a cell value in Python?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do I delete specific rows in pandas?

You can delete a list of rows from Pandas by passing the list of indices to the drop() method. In this code, [5,6] is the index of the rows you want to delete. axis=0 denotes that rows should be deleted from the dataframe.


1 Answers

To directly answer this question's original title "How to delete rows from a pandas DataFrame based on a conditional expression" (which I understand is not necessarily the OP's problem but could help other users coming across this question) one way to do this is to use the drop method:

df = df.drop(some labels) df = df.drop(df[<some boolean condition>].index) 

Example

To remove all rows where column 'score' is < 50:

df = df.drop(df[df.score < 50].index) 

In place version (as pointed out in comments)

df.drop(df[df.score < 50].index, inplace=True) 

Multiple conditions

(see Boolean Indexing)

The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

To remove all rows where column 'score' is < 50 and > 20

df = df.drop(df[(df.score < 50) & (df.score > 20)].index) 
like image 174
User Avatar answered Sep 24 '22 06:09

User