I have a square cost matrix stored as a pandas DataFrame. Rows and columns represent positions [i, j], and I want to multiply all off-diagonal elements (where i != j) by a constant c, without using any for loops for performance reasons.
Is there an efficient way to achieve this in pandas or do I have to switch to numpy and then back to pandas to perform this task?
Example
import pandas as pd
# Sample DataFrame
cost_matrix = pd.DataFrame([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
# Constant
c = 4
# Desired output
# 1 8 12
# 16 5 24
# 28 16 9
Build a boolean mask with numpy.identity
and update the underlying array in place:
cost_matrix.values[np.identity(n=len(cost_matrix))==0] *= c
output:
0 1 2
0 1 8 12
1 16 5 24
2 28 32 9
Intermediate:
np.identity(n=len(cost_matrix))==0
array([[False, True, True],
[ True, False, True],
[ True, True, False]])
NB. for .values
to be a view of the underlying array, the DataFrame must have been constructed from an homogeneous block. If not, it should be converted to one using cost_matrix = cost_matrix.copy()
.
@PaulS suggested to modify all the values and restore the diagonal. I would use:
d = np.diag(cost_matrix)
cost_matrix *= c
np.fill_diagonal(cost_matrix.values, d)
The mask approach seems to be faster on small/medium size inputs, and the diagonal restoration faster on large inputs. (My previous timings were performed online and I don't reproduce the results with perfplot).
NB. the timings below were computed with c=1
or c=-1
to avoid increasing the values exponentially during the timing.
You can create a mask for all diagonal elements (i!=j)
, apply the mask, and then multiply off-diagonal elements:
import pandas as pd
cost_matrix = pd.DataFrame([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
c = 4
# Create a mask where i != j and convert mask to a DataFrame with the same index and columns
mask = cost_matrix.index.values[:, None] != cost_matrix.columns.values
mask_df = pd.DataFrame(mask, index=cost_matrix.index, columns=cost_matrix.columns)
cost_matrix[mask_df] *= c
print(cost_matrix)
This will output
1 8 12
16 5 24
28 32 9
Note: I think you made a typo in your desired output. Bottom middle should be 32 and not 16
This approach does not affect the original DataFrame (not in-place). Whether values returns a copy or a view can depend on the data types and the version of pandas. However, for homogeneous data types (like integers or floats), it generally returns a view, so modifying it will affect the DataFrame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With