Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply condition on pandas columns to create a boolen indexing array

Tags:

I want to drop specific rows from a pandas dataframe. Usually you can do that using something like

df[df['some_column'] != 1234]

What df['some_column'] != 1234 does is creating an indexing array that is indexing the new df, thus letting only rows with value True to be present.

But in some cases, like mine, I don't see how I can express the condition in such a way, and iterating over pandas rows is way too slow to be considered a viable option.

To be more specific, I want to drop all rows where the value of a column is also a key in a dictionary, in a similar manner with the example above.

In a perfect world I would consider something like

df[df['some_column'] not in my_dict.keys()]

Which is obviously not working. Any suggestions?

like image 612
LetsPlayYahtzee Avatar asked Aug 02 '16 20:08

LetsPlayYahtzee


People also ask

How do you create a boolean index in Python?

Creating Boolean Index Let's consider a data frame desciribing the data from a game. The various points scored on different days are mentioned in a dictionary. Then we can create an index on the dataframe using True and False as the indexing values. Then we can print the final dataframe.

Can we perform boolean indexing on a DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

How do you use boolean in pandas?

Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.

How do you set a column to index in pandas?

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.


1 Answers

What you're looking for is isin()

import pandas as pd

df = pd.DataFrame([[1, 2], [1, 3], [4, 6],[5,7],[8,9]], columns=['A', 'B'])
In[9]: df
Out[9]: df
   A  B
0  1  2
1  1  3
2  4  6
3  5  7
4  8  9
mydict = {1:'A',8:'B'}
df[df['A'].isin(mydict.keys())]
Out[11]: 
   A  B
0  1  2
1  1  3
4  8  9
like image 135
Saurav Gupta Avatar answered Sep 28 '22 02:09

Saurav Gupta