suppose i have below df: <pre class="prettyprint"><code>import pandas as pd data_dic = { "a": [0,0,1,2], "b": [0,3,4,5], "c": [6,7,8,9] } df = pd.DataFrame(data_dic) </code></pre> Result: <pre class="prettyprint"><code> a b c 0 0 0 6 1 0 3 7 2 1 4 8 3 2 5 9 </code></pre> I need to past value to new column from above columns based on conditions: <pre class="prettyprint"><code>if df.a > 0 then value df.a else if df.b > 0 then value df.b else value df.c </code></pre> For now i try with: <pre class="prettyprint"><code>df['value'] = [x if x > 0 else 'ww' for x in df['a']] </code></pre> but don't know how to input more conditions in this. Expected result: <pre class="prettyprint"><code> a b c value 0 0 0 6 6 1 0 3 7 3 2 1 4 8 1 3 2 5 9 2 </code></pre> Thank You for hard work.

Use <code>numpy.select</code>: <pre class="prettyprint"><code>df['value'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c) print (df) a b c value 0 0 0 6 6 1 0 3 7 3 2 1 4 8 1 3 2 5 9 2 </code></pre> Difference between vectorized and loop solutions in 400k rows: <pre class="prettyprint"><code>df = pd.concat([df] * 100000, ignore_index=True) In [158]: %timeit df['value2'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c) 9.86 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [159]: %timeit df['value1'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])] 399 ms ± 52.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) </code></pre>

You can also use list comprehension: <pre class="prettyprint"><code>df['value'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])] </code></pre>

You can write a function that takes a row in as a parameter, tests whatever conditions you want to test, and returns a <code>True</code> or <code>False</code> result - which you can then use as a selection tool. (Though on rereading of your question, this may not be what you're looking for - see part 2 below) Perform a Selection <code>apply</code> this function to your dataframe, and use the returned series of True/False answers as an index to select values from the actual dataframe itself. e.g. <pre class="prettyprint"><code>def selector(row): if row['a'] > 0 and row['b'] == 3 : return True elif row['c'] > 2: return True else: return False </code></pre> You can build whatever logic you like, just ensure it returns True when you want a match and False when you don't. Then try something like <pre class="prettyprint"><code>df.apply(lambda row : selector(row), axis=1) </code></pre> And it will return a Series of True-False answers. Plug that into your df to select only those rows that have a <code>True</code> value calculated for them. <pre class="prettyprint"><code>df[df.apply(lambda row : selector(row), axis=1)] </code></pre> And that should give you what you want. Part 2 - Perform a Calculation If you want to create a new column containing some calculated result - then it's a similar operation, create a function that performs your calculation: <pre class="prettyprint"><code>def mycalc(row): if row['a'] > 5 : return row['a'] + row['b'] else: return 66 </code></pre> Only this time, <code>apply</code> the result and assign it to a new column name: <pre class="prettyprint"><code>df['value'] = df.apply( lambda row : mycalc(row), axis = 1) </code></pre> And this will give you that result.

pandas if else conditions on multiple columns [duplicate]

Tags:

python

pandas

suppose i have below df:

import pandas as pd

data_dic = {
    "a": [0,0,1,2],
    "b": [0,3,4,5],
    "c": [6,7,8,9]
}
df = pd.DataFrame(data_dic)

Result:

I need to past value to new column from above columns based on conditions:

if df.a > 0 then value df.a
else if df.b > 0 then value df.b 
else value df.c

For now i try with:

df['value'] = [x if x > 0 else 'ww' for x in df['a']]

but don't know how to input more conditions in this.

Expected result:

   a  b  c value
0  0  0  6  6
1  0  3  7  3
2  1  4  8  1
3  2  5  9  2

Thank You for hard work.

366

asked Aug 08 '19 09:08

Zaraki Kenpachi

3 Answers

Use numpy.select:

df['value'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
print (df)
   a  b  c  value
0  0  0  6      6
1  0  3  7      3
2  1  4  8      1
3  2  5  9      2

Difference between vectorized and loop solutions in 400k rows:

df = pd.concat([df] * 100000, ignore_index=True)

In [158]: %timeit df['value2'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
9.86 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [159]: %timeit df['value1'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]
399 ms ± 52.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

108

answered Oct 25 '22 16:10

jezrael

You can also use list comprehension:

df['value'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]

answered Oct 25 '22 15:10

Neo

You can write a function that takes a row in as a parameter, tests whatever conditions you want to test, and returns a True or False result - which you can then use as a selection tool. (Though on rereading of your question, this may not be what you're looking for - see part 2 below)

Perform a Selection

apply this function to your dataframe, and use the returned series of True/False answers as an index to select values from the actual dataframe itself.

e.g.

def selector(row):
    if row['a'] > 0 and row['b'] == 3 :
        return True
    elif row['c'] > 2:
        return True
    else:
        return False

You can build whatever logic you like, just ensure it returns True when you want a match and False when you don't.

Then try something like

df.apply(lambda row : selector(row), axis=1)

And it will return a Series of True-False answers. Plug that into your df to select only those rows that have a True value calculated for them.

df[df.apply(lambda row : selector(row), axis=1)]

And that should give you what you want.

Part 2 - Perform a Calculation

If you want to create a new column containing some calculated result - then it's a similar operation, create a function that performs your calculation:

def mycalc(row):
    if row['a'] > 5 :
        return row['a'] + row['b']
    else:
        return 66

Only this time, apply the result and assign it to a new column name:

df['value'] = df.apply( lambda row : mycalc(row), axis = 1)

And this will give you that result.

answered Oct 25 '22 15:10

Thomas Kimber

Related questions
                            
                                How to install tesseract for python on anaconda
                            
                                Using for loop to define multiple functions - Python
                            
                                How to fix "polyfit maybe poorly conditioned" in numpy?
                            
                                Extract file name from read_csv - Python
                            
                                python how to re-raise an exception which is already caught?
                            
                                AttributeError: 'Model' object has no attribute 'name'
                            
                                Decorator for timeit.timeit method?
                            
                                Dropping multiple Pandas columns by Index
                            
                                The number of times a function gets called
                            
                                TypeError: Object of type ResultProxy is not JSON serializable: result in sqlalchemy query?
                            
                                bisect.insort complexity not as expected
                            
                                What is the best way to store login credentials on Airflow?
                            
                                AttributeError: 'tuple' object has no attribute 'log_softmax'
                            
                                Generic function typing in Python
                            
                                Schedule Asyncio task to execute every X seconds?
                            
                                How to implement polynomial logistic regression in scikit-learn?
                            
                                Is there a way to create gantt charts in python?
                            
                                Split S3 file into smaller files of 1000 lines
                            
                                django-zappa: Error loading psycopg2 module: libpq.so.5: cannot open shared object file: No such file or directory
                            
                                How to fix "ImportError: unable to find Qt5Core.dll on PATH" after pyinstaller bundled the python application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With