I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A
Item1 Item2 Moves
1 A B 1
2 A C 2
3 B D 3
4 C A 1
5 C B 5
6 D B 4
7 D C 1
I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following
Item1 Item2 Moves
1 A B 1
2 A C 1
3 B D -1
4 C B 5
5 D C 1
Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks
EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.
diff() is used to find the first discrete difference of objects over the given axis. We can provide a period value to shift for forming the difference. axis : Take difference over rows (0) or columns (1).
I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the groupby
function to aggregate the rows.
>>> df
Item1 Item2 Moves
0 A B 1
1 A C 2
2 B D 3
3 C A 1
4 C B 5
5 D B 4
6 D C 1
>>> swapidx = df['Item1'] < df['Item2']
>>> df1 = df[swapidx]
>>> df2 = df[swapidx^True]
>>> df1
Item1 Item2 Moves
0 A B 1
1 A C 2
2 B D 3
>>> df2
Item1 Item2 Moves
3 C A 1
4 C B 5
5 D B 4
6 D C 1
>>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']]
>>> df2['Moves'] = df2['Moves']*-1
>>> df2
Item1 Item2 Moves
3 A C -1
4 B C -5
5 B D -4
6 C D -1
>>> df3 = df1.append(df2)
>>> df3.groupby(['Item1', 'Item2'], as_index=False).sum()
Item1 Item2 Moves
0 A B 1
1 A C 1
2 B C -5
3 B D -1
4 C D -1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With