I am concatenating two Pandas dataframes as below.
part1 = pd.DataFrame({'id' :[100,200,300,400,500],
'amount': np.random.randn(5)
})
part2 = pd.DataFrame({'id' :[700,100,800,500,300],
'amount': np.random.randn(5)
})
concatenated = pd.concat([part1, part2], axis=0)
amount id
0 -0.458653 100
1 2.172348 200
2 0.072494 300
3 -0.253939 400
4 -0.061866 500
0 -1.187505 700
1 -0.810784 100
2 0.321881 800
3 -1.935284 500
4 -1.351507 300
How can I limit the operation so that a row in part2
is only included in concatenated
if the row id
does not already appear in part1
? In a way, I want to treat the id
column like a set.
Is it possible to do this during concat() or is this more a post-processing step?
Desired output for this example would be:
concatenated_desired
amount id
0 -0.458653 100
1 2.172348 200
2 0.072494 300
3 -0.253939 400
4 -0.061866 500
0 -1.187505 700
2 0.321881 800
call drop_duplicates()
after concat()
:
part1 = pd.DataFrame({'id' :[100,200,300,400,500],
'amount': np.arange(5)
})
part2 = pd.DataFrame({'id' :[700,100,800,500,300],
'amount': np.random.randn(5)
})
concatenated = pd.concat([part1, part2], axis=0)
print concatenated.drop_duplicates(cols="id")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With