Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

self-join with Pandas

I would like to perform a self-join on a Pandas dataframe so that some rows get appended to the original rows. Each row has a marker 'i' indicating which row should get appended to it on the right.

d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]

In [17]: d
Out[17]: 
  some_col  i
0        A  2
1        B  1
2        C  1

Desired output:

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

That is, row 2 gets appended to row 0, row 1 to row 1, row 1 to row 2 (as indicated by i).

My idea of how to go about it was

pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')

But it produces something else altogether. How to do it correctly?

like image 936
Nucular Avatar asked Jan 02 '17 23:01

Nucular


1 Answers

Instead of using merge you can also use indexing and assignment:

>>> d['new_col'] = d['some_col'][d['i']].values
>>> d
  some_col  i new_col
0        A  2       C
1        B  1       B
2        C  1       B
like image 185
MSeifert Avatar answered Oct 02 '22 15:10

MSeifert