Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate pairwise difference from specific columns in a dataframe

I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A


    Item1   Item2   Moves
  1  A       B       1
  2  A       C       2
  3  B       D       3
  4  C       A       1
  5  C       B       5
  6  D       B       4
  7  D       C       1

I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following

    Item1   Item2   Moves
  1  A       B       1
  2  A       C       1
  3  B       D      -1
  4  C       B       5
  5  D       C       1

Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks

EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)

like image 567
BigScratch Avatar asked Feb 21 '13 15:02

BigScratch


People also ask

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.

What does diff () do in pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

How do you find the time difference between two rows in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.

What does diff () do in Python?

diff() is used to find the first discrete difference of objects over the given axis. We can provide a period value to shift for forming the difference. axis : Take difference over rows (0) or columns (1).


1 Answers

I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the groupby function to aggregate the rows.

>>> df
  Item1 Item2  Moves
0     A     B      1
1     A     C      2
2     B     D      3
3     C     A      1
4     C     B      5
5     D     B      4
6     D     C      1
>>> swapidx = df['Item1'] < df['Item2']
>>> df1 = df[swapidx]
>>> df2 = df[swapidx^True]
>>> df1
  Item1 Item2  Moves
0     A     B      1
1     A     C      2
2     B     D      3
>>> df2
  Item1 Item2  Moves
3     C     A      1
4     C     B      5
5     D     B      4
6     D     C      1
>>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']]
>>> df2['Moves'] = df2['Moves']*-1
>>> df2
  Item1 Item2  Moves
3     A     C     -1
4     B     C     -5
5     B     D     -4
6     C     D     -1
>>> df3 = df1.append(df2)
>>> df3.groupby(['Item1', 'Item2'], as_index=False).sum()
  Item1 Item2  Moves
0     A     B      1
1     A     C      1
2     B     C     -5
3     B     D     -1
4     C     D     -1
like image 95
D. A. Avatar answered Sep 23 '22 14:09

D. A.