Given a dataframe:
id value
0 1 a
1 2 b
2 3 c
I want to get a new dataframe that is basically the cartesian product of each row with each other row excluding itself:
id value id_2 value_2
0 1 a 2 b
1 1 a 3 c
2 2 b 1 a
3 2 b 3 c
4 3 c 1 a
5 3 c 2 b
This is my approach as of now. I use itertools to get the product and then use pd.concat
with df.loc
to get the new dataframe.
from itertools import product
ids = df.index.values
ids_1, ids_2 = list(zip(*filter(lambda x: x[0] != x[1], product(ids, ids))))
df_new = pd.concat([df.loc[ids_1, :].reset_index(), df.loc[ids_2, :].reset_index()], 1).drop('index', 1)
df_new
id value id value
0 1 a 2 b
1 1 a 3 c
2 2 b 1 a
3 2 b 3 c
4 3 c 1 a
5 3 c 2 b
Is there a simpler way?
Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.
In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter.
Practical Data Science using Python As we know if two lists are like (a, b) and (c, d) then the Cartesian product will be {(a, c), (a, d), (b, c), (b, d)}. To do this we shall use itertools library and use the product() function present in this library. The returned value of this function is an iterator.
We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero
np.eye(len(df))
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
So I subtract it from 1 and
array([[ 0., 1., 1.],
[ 1., 0., 1.],
[ 1., 1., 0.]])
In a boolean context and passed to np.where
I get exactly the upper and lower triangle indices.
i, j = np.where(1 - np.eye(len(df)))
df.iloc[i].reset_index(drop=True).join(
df.iloc[j].reset_index(drop=True), rsuffix='_2')
id value id_2 value_2
0 1 a 2 b
1 1 a 3 c
2 2 b 1 a
3 2 b 3 c
4 3 c 1 a
5 3 c 2 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With