Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace data from one pandas dataframe to another

Tags:

python

pandas

I have two dataframes df1 and df2 . They both contain time-series data, so it is possible some of the dates in df1 and df2 intersect with each other and the rest don't. My requirement is an operation on the two dataframes that replaces the values in df1 with the values in df2 for the same dates, leaves alone values for indexes in df1 not present in df2 and adds the values for indexes present in df2 and not in df1. Consider the following example:

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   E
1   A4  B4  C4  E4
2   A5  B5  C5  E5
3   A6  B6  C6  E6
4   A7  B7  C7  E7

result df:
    A   B   C   D   E
0   A0  BO  C0  D0  Nan
1   A4  B4  C4  D4  E4
2   A5  B5  C5  D5  E5
3   A6  B6  C6  D6  E6
4   A7  B7  C7  D7  E7

I tried to develop the logic with the first step concatenating the two dfs but that leads to rows with duplicate indexes and am not sure how to handle that. How can this be achieved? Any suggestions would help

Edit: A simpler case would be when the column names are same in the two dataframes. So consider df2 has column D instead of E with values D4,D5,D6,D7.

A concatenation yields the following result:

concat(df1,df2,axis=1)
    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0  NaN  NaN  NaN  NaN  
1   A1   B1   C1   D1   A4   B4   C4   D4
2   A2   B2   C2   D2   A5   B5   C5   D5
3   A3   B3   C3   D3   A6   B6   C6   D6
4  NaN  NaN  NaN  NaN   A7   B7   C7   D7

Now this introduces duplicate columns. A conventional solution would be to loop through each column but I am looking for a more elegant solution. Any ideas would be appreciated.

like image 416
john smith Avatar asked May 24 '15 00:05

john smith


Video Answer


2 Answers

update will align on the indices of both DataFrames:

df1.update(df2)

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   D
1   A4  B4  C4  D4
2   A5  B5  C5  D5
3   A6  B6  C6  D6
4   A7  B7  C7  D7

>>> df1.update(df2)
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6

You then need to add the values in df2 not present in df1:

>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]: 
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6
4  A7  B7  C7  D7
like image 62
Alexander Avatar answered Sep 28 '22 06:09

Alexander


I just saw this question and realized it is almost identical to one that I just asked today and that @Alexander (the poster of the answer above) answered very nicely:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

See pandas DataFrame concat / update ("upsert")? for the discussion.

like image 33
embeepea Avatar answered Sep 28 '22 07:09

embeepea