It's a generic question about filtering a pandas dataframe using a list. The problem is the following:
df
with a column field
ban_field=['field1','field2','field3']
ban_field
appear in df.field
For the moment, to retrieve the dataframe without the banned field, I proceed as follows:
for f in ban_field:
df = df[df.field!=f]
Is there a more pythonic way to proceed (in one line?)?
DataFrame. drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows.
To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.
Delete a Multiple Rows by Index Position in DataFrame As df. drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop(). As default value of inPlace is false, so contents of dfObj will not be modified.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
Method #1: use isin
and a boolean array selector:
In [47]: df = pd.DataFrame({"a": [2]*10, "field": range(10)})
In [48]: ban_field = [3,4,6,7,8]
In [49]: df[~df.field.isin(ban_field)]
Out[49]:
a field
0 2 0
1 2 1
2 2 2
5 2 5
9 2 9
[5 rows x 2 columns]
Method #2: use query
:
In [51]: df.query("field not in @ban_field")
Out[51]:
a field
0 2 0
1 2 1
2 2 2
5 2 5
9 2 9
[5 rows x 2 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With