Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently multiply all non-diagonal elements by a constant in a pandas DataFrame?

I have a square cost matrix stored as a pandas DataFrame. Rows and columns represent positions [i, j], and I want to multiply all off-diagonal elements (where i != j) by a constant c, without using any for loops for performance reasons.

Is there an efficient way to achieve this in pandas or do I have to switch to numpy and then back to pandas to perform this task?

Example

import pandas as pd

# Sample DataFrame
cost_matrix = pd.DataFrame([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Constant
c = 4

# Desired output
#    1  8  12
#    16 5  24
#    28 16  9
like image 581
Ludwig B Avatar asked Sep 03 '25 09:09

Ludwig B


2 Answers

Build a boolean mask with numpy.identity and update the underlying array in place:

cost_matrix.values[np.identity(n=len(cost_matrix))==0] *= c

output:

    0   1   2
0   1   8  12
1  16   5  24
2  28  32   9

Intermediate:

np.identity(n=len(cost_matrix))==0

array([[False,  True,  True],
       [ True, False,  True],
       [ True,  True, False]])

NB. for .values to be a view of the underlying array, the DataFrame must have been constructed from an homogeneous block. If not, it should be converted to one using cost_matrix = cost_matrix.copy().

Alternative

@PaulS suggested to modify all the values and restore the diagonal. I would use:

d = np.diag(cost_matrix)
cost_matrix *= c
np.fill_diagonal(cost_matrix.values, d)

Timings

The mask approach seems to be faster on small/medium size inputs, and the diagonal restoration faster on large inputs. (My previous timings were performed online and I don't reproduce the results with perfplot).

NB. the timings below were computed with c=1 or c=-1 to avoid increasing the values exponentially during the timing.

enter image description here

like image 189
mozway Avatar answered Sep 04 '25 22:09

mozway


You can create a mask for all diagonal elements (i!=j), apply the mask, and then multiply off-diagonal elements:

import pandas as pd

cost_matrix = pd.DataFrame([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

c = 4

# Create a mask where i != j and convert mask to a DataFrame with the same index and columns
mask = cost_matrix.index.values[:, None] != cost_matrix.columns.values
mask_df = pd.DataFrame(mask, index=cost_matrix.index, columns=cost_matrix.columns)

cost_matrix[mask_df] *= c

print(cost_matrix)

This will output

 1   8  12
16   5  24
28  32   9

Note: I think you made a typo in your desired output. Bottom middle should be 32 and not 16

This approach does not affect the original DataFrame (not in-place). Whether values returns a copy or a view can depend on the data types and the version of pandas. However, for homogeneous data types (like integers or floats), it generally returns a view, so modifying it will affect the DataFrame.

like image 41
NullDev Avatar answered Sep 04 '25 21:09

NullDev