Given a dataframe: <pre class="prettyprint"><code> id value 0 1 a 1 2 b 2 3 c </code></pre> I want to get a new dataframe that is basically the cartesian product of each row with each other row excluding itself: <pre class="prettyprint"><code> id value id_2 value_2 0 1 a 2 b 1 1 a 3 c 2 2 b 1 a 3 2 b 3 c 4 3 c 1 a 5 3 c 2 b </code></pre> This is my approach as of now. I use itertools to get the product and then use <code>pd.concat</code> with <code>df.loc</code> to get the new dataframe. <pre class="prettyprint"><code>from itertools import product ids = df.index.values ids_1, ids_2 = list(zip(*filter(lambda x: x[0] != x[1], product(ids, ids)))) df_new = pd.concat([df.loc[ids_1, :].reset_index(), df.loc[ids_2, :].reset_index()], 1).drop('index', 1) df_new id value id value 0 1 a 2 b 1 1 a 3 c 2 2 b 1 a 3 2 b 3 c 4 3 c 1 a 5 3 c 2 b </code></pre> Is there a simpler way?

We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero <pre class="prettyprint"><code>np.eye(len(df)) array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) </code></pre> So I subtract it from 1 and <pre class="prettyprint"><code>array([[ 0., 1., 1.], [ 1., 0., 1.], [ 1., 1., 0.]]) </code></pre> In a boolean context and passed to <code>np.where</code> I get exactly the upper and lower triangle indices. <pre class="prettyprint"><code>i, j = np.where(1 - np.eye(len(df))) df.iloc[i].reset_index(drop=True).join( df.iloc[j].reset_index(drop=True), rsuffix='_2') id value id_2 value_2 0 1 a 2 b 1 1 a 3 c 2 2 b 1 a 3 2 b 3 c 4 3 c 1 a 5 3 c 2 b </code></pre>

Cartesian product of a pandas dataframe with itself

Tags:

python

pandas

dataframe

Given a dataframe:

    id  value
0    1     a
1    2     b
2    3     c

I want to get a new dataframe that is basically the cartesian product of each row with each other row excluding itself:

    id  value id_2 value_2
0    1     a     2    b
1    1     a     3    c
2    2     b     1    a
3    2     b     3    c
4    3     c     1    a
5    3     c     2    b

This is my approach as of now. I use itertools to get the product and then use pd.concat with df.loc to get the new dataframe.

from itertools import product

ids = df.index.values
ids_1, ids_2 = list(zip(*filter(lambda x: x[0] != x[1], product(ids, ids))))

df_new = pd.concat([df.loc[ids_1, :].reset_index(), df.loc[ids_2, :].reset_index()], 1).drop('index', 1)

df_new

   id value  id value
0   1     a   2     b
1   1     a   3     c
2   2     b   1     a
3   2     b   3     c
4   3     c   1     a
5   3     c   2     b

Is there a simpler way?

575

asked Jul 31 '17 23:07

cs95

1 Answers

We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero

np.eye(len(df))

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

So I subtract it from 1 and

array([[ 0.,  1.,  1.],
       [ 1.,  0.,  1.],
       [ 1.,  1.,  0.]])

In a boolean context and passed to np.where I get exactly the upper and lower triangle indices.

i, j = np.where(1 - np.eye(len(df)))
df.iloc[i].reset_index(drop=True).join(
    df.iloc[j].reset_index(drop=True), rsuffix='_2')

   id value  id_2 value_2
0   1     a     2       b
1   1     a     3       c
2   2     b     1       a
3   2     b     3       c
4   3     c     1       a
5   3     c     2       b

answered Oct 15 '22 18:10

piRSquared

Related questions
                            
                                How do I initialize a Counter from a list of key/initial counts pairs?
                            
                                Is it possible to embed C++ widget to PyQt application?
                            
                                python - Convolution of 3d array with 2d kernel for each channel separately
                            
                                Creating a custom widget in PyQT5
                            
                                Fibonacci sequence works in python, but not in c?
                            
                                Pandas, read in file without a separator between columns
                            
                                Figuring out my sqlite version
                            
                                Computing training score using cross_val_score
                            
                                Converting a JSON list to CSV file in python
                            
                                How to set other Python interpreter to IPython
                            
                                Get first and second values in dictionary in CPython 3.6
                            
                                How to create an edge list from pandas dataframe?
                            
                                python realtime plot using pyqtgraph
                            
                                Python sockets/port forwarding
                            
                                Plotting mulitple lines on two y axis using Matplotlib
                            
                                Python Pandas Calculate average days between dates
                            
                                cv2.drawContours will not draw filled contour
                            
                                What is colocate_with used for in tensorflow?
                            
                                How to disable manual resizing of Tkinter's Treeview column?
                            
                                error in loading pickle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With