I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A <hr> <pre class="prettyprint"><code> Item1 Item2 Moves 1 A B 1 2 A C 2 3 B D 3 4 C A 1 5 C B 5 6 D B 4 7 D C 1 </code></pre> <hr> I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following <pre class="prettyprint"><code> Item1 Item2 Moves 1 A B 1 2 A C 1 3 B D -1 4 C B 5 5 D C 1 </code></pre> Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)

I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the <code>groupby</code> function to aggregate the rows. <pre class="prettyprint"><code>>>> df Item1 Item2 Moves 0 A B 1 1 A C 2 2 B D 3 3 C A 1 4 C B 5 5 D B 4 6 D C 1 >>> swapidx = df['Item1'] < df['Item2'] >>> df1 = df[swapidx] >>> df2 = df[swapidx^True] >>> df1 Item1 Item2 Moves 0 A B 1 1 A C 2 2 B D 3 >>> df2 Item1 Item2 Moves 3 C A 1 4 C B 5 5 D B 4 6 D C 1 >>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']] >>> df2['Moves'] = df2['Moves']*-1 >>> df2 Item1 Item2 Moves 3 A C -1 4 B C -5 5 B D -4 6 C D -1 >>> df3 = df1.append(df2) >>> df3.groupby(['Item1', 'Item2'], as_index=False).sum() Item1 Item2 Moves 0 A B 1 1 A C 1 2 B C -5 3 B D -1 4 C D -1 </code></pre>

Calculate pairwise difference from specific columns in a dataframe

I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A

    Item1   Item2   Moves
  1  A       B       1
  2  A       C       2
  3  B       D       3
  4  C       A       1
  5  C       B       5
  6  D       B       4
  7  D       C       1

I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following

    Item1   Item2   Moves
  1  A       B       1
  2  A       C       1
  3  B       D      -1
  4  C       B       5
  5  D       C       1

Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks

EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.

What does diff () do in pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

How do you find the time difference between two rows in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.

What does diff () do in Python?

diff() is used to find the first discrete difference of objects over the given axis. We can provide a period value to shift for forming the difference. axis : Take difference over rows (0) or columns (1).

I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the groupby function to aggregate the rows.

>>> df
  Item1 Item2  Moves
0     A     B      1
1     A     C      2
2     B     D      3
3     C     A      1
4     C     B      5
5     D     B      4
6     D     C      1
>>> swapidx = df['Item1'] < df['Item2']
>>> df1 = df[swapidx]
>>> df2 = df[swapidx^True]
>>> df1
  Item1 Item2  Moves
0     A     B      1
1     A     C      2
2     B     D      3
>>> df2
  Item1 Item2  Moves
3     C     A      1
4     C     B      5
5     D     B      4
6     D     C      1
>>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']]
>>> df2['Moves'] = df2['Moves']*-1
>>> df2
  Item1 Item2  Moves
3     A     C     -1
4     B     C     -5
5     B     D     -4
6     C     D     -1
>>> df3 = df1.append(df2)
>>> df3.groupby(['Item1', 'Item2'], as_index=False).sum()
  Item1 Item2  Moves
0     A     B      1
1     A     C      1
2     B     C     -5
3     B     D     -1
4     C     D     -1

Calculate pairwise difference from specific columns in a dataframe

Tags:

python

indexing

pandas

dataframe

BigScratch

People also ask

1 Answers

D. A.

Recent Activity

Donate For Us

Calculate pairwise difference from specific columns in a dataframe

Tags:

python

indexing

pandas

dataframe

BigScratch

People also ask

1 Answers

D. A.

Related questions

Recent Activity

Donate For Us