I'm trying to create a matrix to show the differences between the rows in a Pandas data frame.
import pandas as pd
data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]}
df = pd.DataFrame(data)
I would like this:
Country Values
0 GB 20.2
1 JP -10.5
2 US 5.7
To become something like this (differences going vertically):
Country GB JP US
0 GB 0.0 -30.7 14.5
1 JP 30.7 0.0 16.2
2 US 14.5 -16.2 0.0
Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!
Convert Pandas DataFrame to NumPy Matrix A two-dimensional rectangular array to store data in rows and columns is called python matrix. Matrix is a Numpy array to store data in rows and columns. Using dataframe. to_numpy() method we can convert dataframe to Numpy Matrix.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
By using equals() function we can directly check if df1 is equal to df2. This function is used to determine if two dataframe objects in consideration are equal or not. Unlike dataframe. eq() method, the result of the operation is a scalar boolean value indicating if the dataframe objects are equal or not.
You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.
This is a standard use case for numpy's broadcasting:
df['Values'].values - df['Values'].values[:, None]
Out:
array([[ 0. , -30.7, -14.5],
[ 30.7, 0. , 16.2],
[ 14.5, -16.2, 0. ]])
We access the underlying numpy array with the values attribute and [:, None]
introduces a new axis so the result is two dimensional.
You can concat this with your original Series:
arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out:
Country GB JP US
0 GB 0.0 -30.7 -14.5
1 JP 30.7 0.0 16.2
2 US 14.5 -16.2 0.0
The array can also be generated with the following, thanks to @Divakar:
arr = np.subtract.outer(*[df.Values]*2).T
Here we are calling .outer
on the subtract
ufunc and it applies it to all pair of its inputs.
I try improve Divakar
comment:
a = np.column_stack([df['Country'], np.subtract.outer(*[-df.Values]*2)])
df = pd.DataFrame(a, columns=['Country'] + df['Country'].tolist())
print (df)
Country GB JP US
0 GB 0 -30.7 -14.5
1 JP 30.7 0 16.2
2 US 14.5 -16.2 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With