Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterate over pandas columns based on conditions

want to calculate C based on values of count, A and B

sample df:

count A B C
yes 23 2 nan
nan 23 1 nan
yes 41 6 nan

result I want

count A B C
yes 23 2 46
nan 23 1 0
yes 41 6 246

calculate C = A*B only when count value = yes otherwise C values =0 that is, it should skip nan values of count

Any help is appreciable

I am trying this

for ind, row in df.iterrows():
    if df['count'] == 'yes':
        df.loc[ ind, 'C'] =row['A'] *row['B']
    else:
        df.loc[ ind, 'C'] =0

But it's giving error : ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

like image 595
Priya Chauhan Avatar asked Mar 06 '21 04:03

Priya Chauhan


People also ask

How do I iterate over a column in pandas DataFrame?

One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.

Is Iterrows faster than apply?

This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.

How do you use between conditions in pandas?

Boolean Series in Pandas The between() function is used to get boolean Series equivalent to left <= series <= right. This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

How to iterate over columns in pandas Dataframe?

How to Iterate Over Columns in Pandas DataFrame. You can use the following basic syntax to iterate over columns in a pandas DataFrame: for name, values indf.iteritems(): print(values) The following examples show how to use this syntax in practice with the following pandas DataFrame:

How to apply an IF condition in pandas Dataframe?

Applying an IF condition in Pandas DataFrame. Let’s now review the following 5 cases: (1) IF condition – Set of numbers. Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of ‘True’

How do you filter DataFrames in pandas?

Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them.

How to iterate over rows using iteritems () function in Python?

Iteration over rows using iteritems () In order to iterate over rows, we use iteritems () function this function iterates over each column as key, value pair with label as key and column value as a Series object. Code #1: import pandas as pd


3 Answers

Another option:

df.C = df.A.mul(df.B).where(df['count'].eq('yes')).fillna(0)

df
#  count   A  B      C
#0   yes  23  2   46.0
#1   NaN  23  1    0.0
#2   yes  41  6  246.0

Or if you prefer operators: df.C = (df.A * df.B).where(df['count'] == 'yes').fillna(0)

like image 199
Psidom Avatar answered Oct 07 '22 13:10

Psidom


pandas overloads * for this operation, provided you correctly specify the indices you want to set:

mask = df["count"].notna()
df.loc[mask, "C"] = df["A"]*df["B"]
df.C.fillna(0, inplace=True)

or a slightly more concise version that would annoy your coworkers:

df["C"] = df["A"]*df["B"]*(df["count"].notna())

In the last, df["count"].notna() returns a boolean column, which is converted to a numeric type when multiplied by numerical columns. Concise but as clear.

output for either:

  count   A  B      C
0   yes  23  2   46.0
1   NaN  23  1      0
2   yes  41  6  246.0

This will be more performant than .apply and much more performant than iterrows.

like image 31
anon01 Avatar answered Oct 07 '22 15:10

anon01


Just use this:-

df['C']=df[df['count']=='yes']['C'].fillna(value=df['A']*df['B'])
df['C']=df['C'].fillna(0)

Try this:-

for ind, row in df.iterrows():
    if row['count'] == 'yes':
        df.loc[ ind, 'C'] =row['A'] *row['B']
    else:
        df.loc[ ind, 'C'] =0

You are getting error because you write df['count']=='yes' instead of row['count'] == 'yes'

like image 45
Anurag Dabas Avatar answered Oct 07 '22 15:10

Anurag Dabas