Pandas: concatenating conditioned on unique values

Question

I am concatenating two Pandas dataframes as below.

part1 = pd.DataFrame({'id'    :[100,200,300,400,500], 
                   'amount': np.random.randn(5)
                    })

part2 = pd.DataFrame({'id'    :[700,100,800,500,300], 
                   'amount': np.random.randn(5)
                    })

concatenated = pd.concat([part1, part2], axis=0)
     amount   id
0 -0.458653  100
1  2.172348  200
2  0.072494  300
3 -0.253939  400
4 -0.061866  500
0 -1.187505  700
1 -0.810784  100
2  0.321881  800
3 -1.935284  500
4 -1.351507  300

How can I limit the operation so that a row in part2 is only included in concatenated if the row id does not already appear in part1? In a way, I want to treat the id column like a set.

Is it possible to do this during concat() or is this more a post-processing step?

Desired output for this example would be:

concatenated_desired
     amount   id
0 -0.458653  100
1  2.172348  200
2  0.072494  300
3 -0.253939  400
4 -0.061866  500
0 -1.187505  700
2  0.321881  800

HYRY · Accepted Answer

call drop_duplicates() after concat():

part1 = pd.DataFrame({'id'    :[100,200,300,400,500], 
                   'amount': np.arange(5)
                    })

part2 = pd.DataFrame({'id'    :[700,100,800,500,300], 
                   'amount': np.random.randn(5)
                    })

concatenated = pd.concat([part1, part2], axis=0)
print concatenated.drop_duplicates(cols="id")

Pandas: concatenating conditioned on unique values

Tags:

python

concatenation

pandas

set

Zhubarb

1 Answers

HYRY

Recent Activity

Donate For Us

Pandas: concatenating conditioned on unique values

Tags:

python

concatenation

pandas

set

Zhubarb

1 Answers

HYRY

Related questions

Recent Activity

Donate For Us