Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional replacement in pandas

I have a dataframe spanning several years and at some point they changed the codes for ethnicity. So I need to recode the values conditional on the year - which is another column in the same dataframe. For instance 1 to 3, 2 to 3, 3 to 4 and so on:

old = [1, 2, 3, 4, 5, 91]
new = [3, 3, 4, 2, 1, 6]

And this is only done for the years 1996 to 2001. The values for the other years in the same column (ethnicity) must not be changed. Hoping to avoid too many inefficient loops, I tried:

    recode_years = range(1996,2002)
    for year in recode_years:
        df['ethnicity'][df.year==year].replace(old, new, inplace=True)

But the original values in the dataframe did not change. The replace method itself replaced and returned the new values correctly, but the inplace option seems not to affect the original dataframe when applying a conditional. This may be obvious to experienced Pandas users, but surely there must be some simple way of doing this instead of looping over every singel element?

Edit (x2): Her is an an example of another approach which also did not work ('Length of replacements must equal series length' and "TypeError: array cannot be safely cast to required type"):

oldNewMap = {1:2, 2:3}
df2 = DataFrame({"year":[2000,2000,2000,2001,2001,2001],"ethnicity":[1,2,1,2,3,1]})
df2['ethnicity'][df2.year==2000] = df2['ethnicity'][df2.year==2000].map(oldNewMap)

Edit: It seems to be a problems specific to the installation/version since this works fine on my other computer.

like image 680
hmelberg Avatar asked Apr 22 '13 17:04

hmelberg


People also ask

How do you conditionally replace values in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you replace a specific value in a pandas DataFrame?

Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How do you use conditional in pandas?

Applying an IF condition in Pandas DataFrameIf the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'


1 Answers

It may just be simpler to do it a different way:

oldNewMap = {1: 3, 2: 3, 3: 4, 4: 2, 5: 1, 91: 6}
df['ethnicity'][df.year==year] = df['ethnicity'][df.year==year].map(oldNewMap)
like image 86
BrenBarn Avatar answered Oct 13 '22 23:10

BrenBarn