Now I know how to check the dataframe for specific values across multiple columns. However, I cant seem to work out how to carry out an if statement based on a boolean response.
For example:
Walk directories using os.walk
and read in a specific file into a dataframe.
for root, dirs, files in os.walk(main):
filters = '*specificfile.csv'
for filename in fnmatch.filter(files, filters):
df = pd.read_csv(os.path.join(root, filename),error_bad_lines=False)
Now checking that dataframe across multiple columns. The first value being the column name (column1), the next value is the specific value I am looking for in that column(banana). I am then checking another column (column2) for a specific value (green). If both of these are true I want to carry out a specific task. However if it is false I want to do something else.
so something like:
if (df['column1']=='banana') & (df['colour']=='green'):
do something
else:
do something
In this tutorial, we will go through several ways in which you create Pandas conditional columns. Pandas’ loc can create a boolean mask, based on condition. It can either just be selecting rows and columns, or it can be used to filter dataframes. column_name2 is the column to create or change, it could be the same as column_name1
Applying an IF condition in Pandas DataFrame. Let’s now review the following 5 cases: (1) IF condition – Set of numbers. Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of ‘True’
Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them.
Using Pandas Map to Set Values in Another Column The Pandas.map () method is very helpful when you’re applying labels to another column. In order to use this method, you define a dictionary to apply to the column. For our sample dataframe, let’s imagine that we have offices in America, Canada, and France.
If you want to check if any row of the DataFrame meets your conditions you can use .any()
along with your condition . Example -
if ((df['column1']=='banana') & (df['colour']=='green')).any():
Example -
In [16]: df
Out[16]:
A B
0 1 2
1 3 4
2 5 6
In [17]: ((df['A']==1) & (df['B'] == 2)).any()
Out[17]: True
This is because your condition - ((df['column1']=='banana') & (df['colour']=='green'))
- returns a Series of True/False values.
This is because in pandas when you compare a series against a scalar value, it returns the result of comparing each row of that series against the scalar value and the result is a series of True/False values indicating the result of comparison of that row with the scalar value. Example -
In [19]: (df['A']==1)
Out[19]:
0 True
1 False
2 False
Name: A, dtype: bool
In [20]: (df['B'] == 2)
Out[20]:
0 True
1 False
2 False
Name: B, dtype: bool
And the &
does row-wise and
for the two series. Example -
In [18]: ((df['A']==1) & (df['B'] == 2))
Out[18]:
0 True
1 False
2 False
dtype: bool
Now to check if any of the values from this series is True, you can use .any()
, to check if all the values in the series are True, you can use .all()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With