Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove lines in dataframe using a list in Pandas

Tags:

python

pandas

It's a generic question about filtering a pandas dataframe using a list. The problem is the following:

  • I have a pandas dataframe df with a column field
  • I have a list of banned fields, for example ban_field=['field1','field2','field3']
  • All elements of ban_field appear in df.field

For the moment, to retrieve the dataframe without the banned field, I proceed as follows:

for f in ban_field:
    df = df[df.field!=f]

Is there a more pythonic way to proceed (in one line?)?

like image 342
Colonel Beauvel Avatar asked Sep 10 '14 14:09

Colonel Beauvel


People also ask

How do you exclude rows from a DataFrame?

DataFrame. drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows.

How do I remove rows from a DataFrame in Python?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do I drop multiple rows in a DataFrame?

Delete a Multiple Rows by Index Position in DataFrame As df. drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop(). As default value of inPlace is false, so contents of dfObj will not be modified.

How do you delete a row from a DataFrame based on a column value?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.


1 Answers

Method #1: use isin and a boolean array selector:

In [47]: df = pd.DataFrame({"a": [2]*10, "field": range(10)})

In [48]: ban_field = [3,4,6,7,8]

In [49]: df[~df.field.isin(ban_field)]
Out[49]: 
   a  field
0  2      0
1  2      1
2  2      2
5  2      5
9  2      9

[5 rows x 2 columns]

Method #2: use query:

In [51]: df.query("field not in @ban_field")
Out[51]: 
   a  field
0  2      0
1  2      1
2  2      2
5  2      5
9  2      9

[5 rows x 2 columns]
like image 173
DSM Avatar answered Sep 17 '22 23:09

DSM