Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace column values based on another dataframe python pandas - better way?

Tags:

python

pandas

Note:for simplicity's sake, i'm using a toy example, because copy/pasting dataframes is difficult in stack overflow (please let me know if there's an easy way to do this).

Is there a way to merge the values from one dataframe onto another without getting the _X, _Y columns? I'd like the values on one column to replace all zero values of another column.

df1:   Name   Nonprofit    Business    Education  X      1             1           0 Y      0             1           0   <- Y and Z have zero values for Nonprofit and Educ Z      0             0           0 Y      0             1           0  df2:  Name   Nonprofit    Education Y       1            1     <- this df has the correct values.  Z       1            1    pd.merge(df1, df2, on='Name', how='outer')  Name   Nonprofit_X    Business    Education_X     Nonprofit_Y     Education_Y Y       1                1          1                1               1 Y      1                 1          1                1               1 X      1                 1          0               nan             nan    Z      1                 1          1                1               1 

In a previous post, I tried combine_First and dropna(), but these don't do the job.

I want to replace zeros in df1 with the values in df2. Furthermore, I want all rows with the same Names to be changed according to df2.

Name    Nonprofit     Business    Education Y        1             1           1 Y        1             1           1  X        1             1           0 Z        1             0           1 

(need to clarify: The value in 'Business' column where name = Z should 0.)

My existing solution does the following: I subset based on the names that exist in df2, and then replace those values with the correct value. However, I'd like a less hacky way to do this.

pubunis_df = df2 sdf = df1   regex = str_to_regex(', '.join(pubunis_df.ORGS))  pubunis = searchnamesre(sdf, 'ORGS', regex)  sdf.ix[pubunis.index, ['Education', 'Public']] = 1 searchnamesre(sdf, 'ORGS', regex) 
like image 532
user3314418 Avatar asked Jul 15 '14 21:07

user3314418


People also ask

How do I replace column values based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I replace multiple column values in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.


2 Answers

Attention: In latest version of pandas, both answers above doesn't work anymore:

KSD's answer will raise error:

df1 = pd.DataFrame([["X",1,1,0],               ["Y",0,1,0],               ["Z",0,0,0],               ["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"])      df2 = pd.DataFrame([["Y",1,1],               ["Z",1,1]],columns=["Name","Nonprofit", "Education"])     df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values  df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values  Out[851]: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,) 

and EdChum's answer will give us the wrong result:

 df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']]  df1 Out[852]:    Name  Nonprofit  Business  Education 0    X        1.0         1        0.0 1    Y        1.0         1        1.0 2    Z        NaN         0        NaN 3    Y        NaN         1        NaN 

Well, it will work safely only if values in column 'Name' are unique and are sorted in both data frames.

Here is my answer:

Way 1:

df1 = df1.merge(df2,on='Name',how="left") df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x']) df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x']) df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1) df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True) 

Way 2:

df1 = df1.set_index('Name') df2 = df2.set_index('Name') df1.update(df2) df1.reset_index(inplace=True) 

More guide about update.. The columns names of both data frames need to set index are not necessary same before 'update'. You could try 'Name1' and 'Name2'. Also, it works even if other unnecessary row in df2, which won't update df1. In other words, df2 doesn't need to be the super set of df1.

Example:

df1 = pd.DataFrame([["X",1,1,0],               ["Y",0,1,0],               ["Z",0,0,0],               ["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"])      df2 = pd.DataFrame([["Y",1,1],               ["Z",1,1],               ['U',1,3]],columns=["Name2","Nonprofit", "Education"])     df1 = df1.set_index('Name1') df2 = df2.set_index('Name2')   df1.update(df2) 

result:

      Nonprofit  Business  Education Name1                                 X           1.0         1        0.0 Y           1.0         1        1.0 Z           1.0         0        1.0 Y           1.0         1        1.0 
like image 170
Jeremy Z Avatar answered Sep 27 '22 20:09

Jeremy Z


Use the boolean mask from isin to filter the df and assign the desired row values from the rhs df:

In [27]:  df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']] df Out[27]:   Name  Nonprofit  Business  Education 0    X          1         1          0 1    Y          1         1          1 2    Z          1         0          1 3    Y          1         1          1  [4 rows x 4 columns] 
like image 32
EdChum Avatar answered Sep 27 '22 21:09

EdChum