I'm working with some data where the customer postcode data is invalid. As a result I'm not able to map the CountryISOCode to their postcode resulting in a NaN. However, I have noticed that for all CountryISOCodes with NaN, the CurrencyCode can provide me with enough to fix the problem for now. I've gone to various Stackoverflow articles but I cannot find the solution to my problem. I've tried... <pre class="prettyprint"><code>def func(row): if row['CountryISOCode'] == np.nan & row['Currency'] == 'EUR': return 'IRE' elif row['CountryISOCode'] == np.nan & row['Currency'] == 'GBP': return 'GBR' else: return row['CountryISOCode'] df['CountryISOCode'] = df.apply(func, axis=1) </code></pre> and some other methods but to no avail... Below I have provided a replication of the data I'm working with <pre class="prettyprint"><code>import pandas as pd import numpy as np data = [ ['Steve', 'Invalid Postcode', 'GBP', np.nan ], ['Robyn', 'Invalid Postcode', 'EUR', np.nan], ['James', 'Valid Postcode', 'GBP', 'GBR'], ['Halo', 'Invalid Postcode', 'EUR', np.nan], ['Jesus', 'Valid Postcode', 'GBP', 'GBR'] ] df = pd.DataFrame(columns=["Name", "PostCode", "CurrencyCode", "CountryISOCode"], data=data) </code></pre> Essentially if I was working with SQL my code would be as follows. <pre class="prettyprint"><code>IF countryISOCode IS NULL AND currency = ‘GBP’ THEN CountryISOCode = ‘GBR’ ELSE IF countryISOCode IS NULL AND currency = ‘EUR THEN CountryISOCode = ‘IRE’ ELSE countryISOCode END </code></pre> Any ideas?

You could use <code>np.select</code> for this, which allows you to choose from a list depending on the result of a list of conditions: <pre class="prettyprint"><code>m1 = df.CountryISOCode.isna() m2 = df.CurrencyCode.eq('GBP') m3 = df.CurrencyCode.eq('EUR') df.loc[:,'CountryISOCode'] = np.select([m1&m2, m1&m3], ['GBP','IRE'], default=df.CountryISOCode) Name PostCode CurrencyCode CountryISOCode 0 Steve Invalid Postcode GBP GBP 1 Robyn Invalid Postcode EUR IRE 2 James Valid Postcode GBP GBR 3 Halo Invalid Postcode EUR IRE 4 Jesus Valid Postcode GBP GBR </code></pre>

Handling NaN Values in Pandas with Conditional Statement

Tags:

python

pandas

I'm working with some data where the customer postcode data is invalid. As a result I'm not able to map the CountryISOCode to their postcode resulting in a NaN. However, I have noticed that for all CountryISOCodes with NaN, the CurrencyCode can provide me with enough to fix the problem for now.

I've gone to various Stackoverflow articles but I cannot find the solution to my problem. I've tried...

def func(row):
    if row['CountryISOCode'] == np.nan & row['Currency'] == 'EUR':
        return 'IRE'
elif row['CountryISOCode'] == np.nan & row['Currency'] == 'GBP':
    return 'GBR'
else:
    return row['CountryISOCode']

df['CountryISOCode'] = df.apply(func, axis=1)

and some other methods but to no avail...

Below I have provided a replication of the data I'm working with

import pandas as pd
import numpy as np

data = [
    ['Steve', 'Invalid Postcode', 'GBP', np.nan ],
    ['Robyn', 'Invalid Postcode', 'EUR', np.nan],
    ['James', 'Valid Postcode', 'GBP', 'GBR'],
    ['Halo', 'Invalid Postcode', 'EUR', np.nan],
    ['Jesus', 'Valid Postcode', 'GBP', 'GBR']
    ]

df = pd.DataFrame(columns=["Name", "PostCode", "CurrencyCode", "CountryISOCode"], data=data)

Essentially if I was working with SQL my code would be as follows.

IF countryISOCode IS NULL 
    AND currency = ‘GBP’ 
THEN CountryISOCode =  ‘GBR’
ELSE
IF countryISOCode IS NULL 
    AND currency = ‘EUR 
THEN CountryISOCode =  ‘IRE’
ELSE countryISOCode 
END

Any ideas?

461

asked Jan 29 '19 15:01

Ryan Davies

Video Answer

3 Answers

You can use fillna with a dictionary specifying mappings for when currency code is helpful:

cmap = {'GBP': 'GBR', 'EUR': 'IRE'}
df['CountryISOCode'] = df['CountryISOCode'].fillna(df['CurrencyCode'].map(cmap))

print(df)

    Name          PostCode CurrencyCode CountryISOCode
0  Steve  Invalid Postcode          GBP            GBR
1  Robyn  Invalid Postcode          EUR            IRE
2  James    Valid Postcode          GBP            GBR
3   Halo  Invalid Postcode          EUR            IRE
4  Jesus    Valid Postcode          GBP            GBR

answered Oct 11 '22 18:10

jpp

You could use np.select for this, which allows you to choose from a list depending on the result of a list of conditions:

m1 = df.CountryISOCode.isna()
m2 = df.CurrencyCode.eq('GBP')
m3 = df.CurrencyCode.eq('EUR')
df.loc[:,'CountryISOCode'] = np.select([m1&m2, m1&m3], ['GBP','IRE'], 
                                       default=df.CountryISOCode)

 Name          PostCode CurrencyCode CountryISOCode
0  Steve  Invalid Postcode          GBP            GBP
1  Robyn  Invalid Postcode          EUR            IRE
2  James    Valid Postcode          GBP            GBR
3   Halo  Invalid Postcode          EUR            IRE
4  Jesus    Valid Postcode          GBP            GBR

answered Oct 11 '22 18:10

yatu

I am adding this answer as it adds value to the original question. The reason the comparison statements weren't working is because np.nan == np.nan will not work. You can check for the identity of the NaN element but not equality. See in operator, float("NaN") and np.nan for more detail. With that said, this is how you can transform the original code to make it work as expected.

import pandas as pd                                                                                                                                    
import numpy as np

raw_data = [
    ['Steve', 'Invalid Postcode', 'GBP', np.nan ],
    ['Robyn', 'Invalid Postcode', 'EUR', np.nan],
    ['James', 'Valid Postcode', 'GBP', 'GBR'],
    ['Halo', 'Invalid Postcode', 'EUR', np.nan],
    ['Jesus', 'Valid Postcode', 'GBP', 'GBR']
    ]

df = pd.DataFrame(columns=["Name", "PostCode", "Currency", "CountryISOCode"], data=raw_data)

def func(row):
    if row['CountryISOCode'] is np.nan and row['Currency'] == 'EUR':
        return 'IRE'
    elif row['CountryISOCode'] is np.nan and row['Currency'] == 'GBP':
        return 'GBR'
    else:
        return row['CountryISOCode']

df['CountryISOCode'] = df.apply(func, axis=1)

print(df)

However, the other answers are great also.

answered Oct 11 '22 18:10

Rachel

Related questions
                            
                                Pytest: Only run linter checks (pytest-flake8), don't run tests
                            
                                Upgrading Python to 3.7 inside venv? [duplicate]
                            
                                Tensorflow importing crashes Python without any error on Windows
                            
                                Getting file url after upload amazon s3 python, boto3
                            
                                Python decorator to time recursive functions
                            
                                Anaconda python ver5.3 hangs at update forever
                            
                                How to determine file path in Google colab?
                            
                                Why we use range(len) in for loop in python?
                            
                                Python: How to hide output Chrome messages in Selenium?
                            
                                How to convert a series of tuples into a pandas dataframe?
                            
                                How to Setup Adaptive Learning Rate in Keras
                            
                                Recursively print pyramid of numbers
                            
                                NameError: name 'drive_service' is not defined Google API
                            
                                No such file or directory 'nltk_data/corpora/stopwords/English' when using colab
                            
                                seaborn jointplot color by density
                            
                                Replace duplicate items from list while keeping the first occurrence
                            
                                Why does ordering matter in type hinting?
                            
                                pandas groupby aggregate customised function with multiple columns
                            
                                Adding a trend line to a matplotlib line plot python
                            
                                Pandas groupby and calculate percentage change

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling NaN Values in Pandas with Conditional Statement

Tags:

python

pandas

Ryan Davies

People also ask

Video Answer

3 Answers

jpp

yatu

Rachel

Recent Activity

Donate For Us