Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an efficient way to merge two sorted dataframes in pandas, maintaing sortedness?

If I have two dataframes (or series) that are already sorted on compatible keys, I'd like to be able to cheaply merge them together and maintain sortedness. I can't see a way to do that other than via concat() and explicit sort()

a = pd.DataFrame([0,1,2,3], index=[1,2,3,5], columns=['x'])
b = pd.DataFrame([4,5,6,7], index=[0,1,4,6], columns=['x'])
print pd.concat([a,b])
print pd.concat([a,b]).sort()

   x
1  0
2  1
3  2
5  3
0  4
1  5
4  6
6  7

   x
0  4
1  0
1  5
2  1
3  2
4  6
5  3
6  7

It looks like there has been a bit of related discussion with numpy arrays, suggesting an 'interleave' method, but I haven't found a good answer.

like image 603
patricksurry Avatar asked May 01 '13 12:05

patricksurry


People also ask

Is pandas merge efficient?

Merge can be used in cases where both the left and right columns are not unique, and therefore cannot be an index. A merge is also just as efficient as a join as long as: Merging is done on indexes if possible.

Is merge or join faster pandas?

As you can see, the merge is faster than joins, though it is small value, but over 4000 iterations, that small value becomes a huge number, in minutes.

Can you merge two DataFrames of different lengths pandas?

It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.


1 Answers

If we limit the problem to a and b having only one column, then I would go through this path:

s = a.merge(b, how='outer', left_index=True, right_index=True)
s.stack().reset_index(level=1, drop=True)
like image 175
Zeugma Avatar answered Nov 12 '22 01:11

Zeugma