I'm trying to create a matrix to show the differences between the rows in a Pandas data frame. <pre class="prettyprint"><code>import pandas as pd data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]} df = pd.DataFrame(data) </code></pre> I would like this: <pre class="prettyprint"><code> Country Values 0 GB 20.2 1 JP -10.5 2 US 5.7 </code></pre> To become something like this (differences going vertically): <pre class="prettyprint"><code> Country GB JP US 0 GB 0.0 -30.7 14.5 1 JP 30.7 0.0 16.2 2 US 14.5 -16.2 0.0 </code></pre> Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!

This is a standard use case for numpy's broadcasting: <pre class="prettyprint"><code>df['Values'].values - df['Values'].values[:, None] Out: array([[ 0. , -30.7, -14.5], [ 30.7, 0. , 16.2], [ 14.5, -16.2, 0. ]]) </code></pre> We access the underlying numpy array with the values attribute and <code>[:, None]</code> introduces a new axis so the result is two dimensional. You can concat this with your original Series: <pre class="prettyprint"><code>arr = df['Values'].values - df['Values'].values[:, None] pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1) Out: Country GB JP US 0 GB 0.0 -30.7 -14.5 1 JP 30.7 0.0 16.2 2 US 14.5 -16.2 0.0 </code></pre> The array can also be generated with the following, thanks to @Divakar: <pre class="prettyprint"><code>arr = np.subtract.outer(*[df.Values]*2).T </code></pre> Here we are calling <code>.outer</code> on the <code>subtract</code> ufunc and it applies it to all pair of its inputs.

Pandas - Creating Difference Matrix from Data Frame

Tags:

python

pandas

I'm trying to create a matrix to show the differences between the rows in a Pandas data frame.

import pandas as pd

data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]}
df = pd.DataFrame(data)

I would like this:

  Country  Values
0      GB    20.2
1      JP   -10.5
2      US     5.7

To become something like this (differences going vertically):

  Country     GB     JP     US
0      GB    0.0  -30.7   14.5
1      JP   30.7    0.0   16.2
2      US   14.5  -16.2    0.0

Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!

794

asked Sep 17 '17 17:09

alpacafondue

2 Answers

This is a standard use case for numpy's broadcasting:

df['Values'].values - df['Values'].values[:, None]
Out: 
array([[  0. , -30.7, -14.5],
       [ 30.7,   0. ,  16.2],
       [ 14.5, -16.2,   0. ]])

We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

You can concat this with your original Series:

arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out: 
  Country    GB    JP    US
0      GB   0.0 -30.7 -14.5
1      JP  30.7   0.0  16.2
2      US  14.5 -16.2   0.0

The array can also be generated with the following, thanks to @Divakar:

arr = np.subtract.outer(*[df.Values]*2).T

Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

131

answered Oct 25 '22 01:10

ayhan

I try improve Divakar comment:

a = np.column_stack([df['Country'], np.subtract.outer(*[-df.Values]*2)])

df = pd.DataFrame(a, columns=['Country'] + df['Country'].tolist())
print (df)
  Country    GB    JP    US
0      GB     0 -30.7 -14.5
1      JP  30.7     0  16.2
2      US  14.5 -16.2     0

answered Oct 25 '22 01:10

jezrael

Related questions
                            
                                Create combination of two pandas dataframes in two dimensions
                            
                                Storing a custom Python object in Redis
                            
                                Pandas: parsing 24:00 instead of 00:00
                            
                                Where is Python's home directory?
                            
                                Python list comprehension with dummy names identical to iterator name: ill-advised?
                            
                                Performance in different vectorization method in numpy
                            
                                Best Fit Line on Log Log Scales in python 2.7
                            
                                How to bind the backspace key in tkinter to delete more than one character?
                            
                                Python 3.6 unavailable in AWS CodeBuild, Python 3.5 unavailable in AWS Lambda
                            
                                Include Content-disposition header for Django FileUpload
                            
                                Python unit test mock. ValueError: The truth value of a DataFrame is ambiguous
                            
                                SQLAlchemy "excluded" PostgreSQL namespace in INSERT ... ON CONFLICT
                            
                                pycharm debugger not connecting: KeyboardInterrupt
                            
                                AttributeError: 'dict' object has no attribute 'iteritems' [duplicate]
                            
                                Convert ascii string to base64 without the "b" and quotation marks
                            
                                How to plot multiple seasonal_decompose plots in one figure?
                            
                                how to log a file in Django
                            
                                What does the "0b" mean at the begining of the byte 0b1100010?
                            
                                (Xcode and Python) error: unrecognized arguments: -NSDocumentRevisionsDebugMode
                            
                                How to simplify logarithm of exponent in sympy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With