Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill the diagonal of Pandas DataFrame with elements from Pandas Series

Given a pandas Series with an index:

import pandas as pd

s = pd.Series(data=[1,2,3],index=['a','b','c'])

How can a Series be used to fill the diagonal entries of an empty DataFrame in pandas version >= 0.23.0?

The resulting DataFrame would look like:

  a b c
a 1 0 0
b 0 2 0
c 0 0 3

There is a prior similar question which will fill the diagonal with the same value, my question is asking to fill the diagonal with varying values from a Series.

Thank you in advance for your consideration and response.

like image 784
Ramón J Romero y Vigil Avatar asked Jul 25 '18 13:07

Ramón J Romero y Vigil


2 Answers

First create DataFrame and then numpy.fill_diagonal:

import numpy as np

s = pd.Series(data=[1,2,3],index=['a','b','c'])

df = pd.DataFrame(0, index=s.index, columns=s.index, dtype=s.dtype)

np.fill_diagonal(df.values, s)
print (df)
   a  b  c
a  1  0  0
b  0  2  0
c  0  0  3

Another solution is create empty 2d array, add values to diagonal and last use DataFrame constructor:

arr = np.zeros((len(s), len(s)), dtype=s.dtype)
np.fill_diagonal(arr, s)

print (arr)
[[1 0 0]
 [0 2 0]
 [0 0 3]]

df = pd.DataFrame(arr, index=s.index, columns=s.index)
print (df)
   a  b  c
a  1  0  0
b  0  2  0
c  0  0  3
like image 189
jezrael Avatar answered Sep 23 '22 15:09

jezrael


I'm not sure about directly doing it with Pandas, but you can do this easily enough if you don't mind using numpy.diag() to build the diagonal data matrix for your series and then plugging that into a DataFrame:

diag_data = np.diag(s)  # don't need s.as_matrix(), turns out
df = pd.DataFrame(diag_data, index=s.index, columns=s.index)

   a  b  c
a  1  0  0
b  0  2  0
c  0  0  3

In one line:

df = pd.DataFrame(np.diag(s),
                  index=s.index,
                  columns=s.index)

Timing comparison with a Series made from a random array of 10000 elements:

s = pd.Series(np.random.rand(10000), index=np.arange(10000))

df = pd.DataFrame(np.diag(s), ...)
173 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)

df = pd.DataFrame(0, ...)
np.fill_diagonal(df.values, s)
212 ms ± 909 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)

mat = np.zeros(...)
np.fill_diagonal(mat, s)
df = pd.DataFrame(mat, ...)
175 ms ± 3.72 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)

It looks like the first and third option shown here are essentially the same, while the middle option is the slowest.

like image 43
Engineero Avatar answered Sep 24 '22 15:09

Engineero