Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested if statements with .loc in pandas / python

I am using if in a conditional statement like the below code. If address is NJ then the value of name column is changed to 'N/A'.

df1.loc[df1.Address.isin(['NJ']), 'name'] = 'N/A'

How do I do the same, if I have 'nested if statements' like below?

# this not code just representing the logic
if address isin ('NJ', 'NY'):
    if name1 isin ('john', 'bob'):
        name1 = 'N/A' 
    if name2 isin ('mayer', 'dylan'):
        name2 = 'N/A'

Can I achieve above logic using df.loc? Or is there any other way to do it?

like image 202
singularity2047 Avatar asked Apr 25 '18 21:04

singularity2047


People also ask

What does .loc return in pandas?

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Access group of values using labels. Single label. Note this returns the row as a Series.

What does .loc in Python do?

The pandas library in Python is used to work with dataframes that structure data in rows and columns. It is widely used in data analysis and machine learning. The loc operator is used to index a portion of the dataframe. loc supports indexing both by row and column names and by using boolean expressions.

What is the difference between .loc & ILOC?

The main distinction between the two methods is: loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.


1 Answers

Separate assignments, as shown by @MartijnPeiters, are a good idea for a small number of conditions.

For a large number of conditions, consider using numpy.select to separate your conditions and choices. This should make your code more readable and easier to maintain.

For example:

import pandas as pd, numpy as np

df = pd.DataFrame({'address': ['NY', 'CA', 'NJ', 'NY', 'WS'],
                   'name1': ['john', 'mayer', 'dylan', 'bob', 'mary'],
                   'name2': ['mayer', 'dylan', 'mayer', 'bob', 'bob']})

address_mask = df['address'].isin(('NJ', 'NY'))

conditions = [address_mask & df['name1'].isin(('john', 'bob')),
              address_mask & df['name2'].isin(('mayer', 'dylan'))]

choices = ['Option 1', 'Option 2']

df['result'] = np.select(conditions, choices)

print(df)

  address  name1  name2    result
0      NY   john  mayer  Option 1
1      CA  mayer  dylan         0
2      NJ  dylan  mayer  Option 2
3      NY    bob    bob  Option 1
4      WS   mary    bob         0
like image 84
jpp Avatar answered Oct 07 '22 00:10

jpp