I have a dataframe spanning several years and at some point they changed the codes for ethnicity. So I need to recode the values conditional on the year - which is another column in the same dataframe. For instance 1 to 3, 2 to 3, 3 to 4 and so on:
old = [1, 2, 3, 4, 5, 91]
new = [3, 3, 4, 2, 1, 6]
And this is only done for the years 1996 to 2001. The values for the other years in the same column (ethnicity) must not be changed. Hoping to avoid too many inefficient loops, I tried:
recode_years = range(1996,2002)
for year in recode_years:
df['ethnicity'][df.year==year].replace(old, new, inplace=True)
But the original values in the dataframe did not change. The replace method itself replaced and returned the new values correctly, but the inplace option seems not to affect the original dataframe when applying a conditional. This may be obvious to experienced Pandas users, but surely there must be some simple way of doing this instead of looping over every singel element?
Edit (x2): Her is an an example of another approach which also did not work ('Length of replacements must equal series length' and "TypeError: array cannot be safely cast to required type"):
oldNewMap = {1:2, 2:3}
df2 = DataFrame({"year":[2000,2000,2000,2001,2001,2001],"ethnicity":[1,2,1,2,3,1]})
df2['ethnicity'][df2.year==2000] = df2['ethnicity'][df2.year==2000].map(oldNewMap)
Edit: It seems to be a problems specific to the installation/version since this works fine on my other computer.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.
Applying an IF condition in Pandas DataFrameIf the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'
It may just be simpler to do it a different way:
oldNewMap = {1: 3, 2: 3, 3: 4, 4: 2, 5: 1, 91: 6}
df['ethnicity'][df.year==year] = df['ethnicity'][df.year==year].map(oldNewMap)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With