Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate "energy" of columns with pandas

I try to calculate the signal energy of my pandas.DataFrame following this formula for discrete-time signal. I tried with apply and applymap, also with reduce, as suggested here: How do I columnwise reduce a pandas dataframe? . But all I tried resulted doing the operation for each element, not for the whole column.

This not a signal processing specific question, it's just an example how to apply a "summarize" (I don't know the right term for this) function to columns.

My workaround, was to get the raw numpy.array data and do my calculations. But I am pretty sure there is a pandatic way to do this (and surly a more numpyic way).

import pandas as pd
import numpy as np

d = np.array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
              [0, -1, 2, -3, 4, -5, 6, -7, 8, -9],
              [0, 1, -2, 3, -4, 5, -6, 7, -8, 9]]).transpose()
df = pd.DataFrame(d)

energies = []

# a same as d
a = df.as_matrix()
assert(np.array_equal(a, d))

for column in range(a.shape[1]):
    energies.append(sum(a[:,column] ** 2))

print(energies) # [40, 285, 285]

Thanks in advance!

like image 536
ppasler Avatar asked Mar 10 '23 19:03

ppasler


1 Answers

You could do the following for dataframe output -

(df**2).sum(axis=0) # Or (df**2).sum(0)

For performance, we could work with array extracted from the dataframe -

(df.values**2).sum(axis=0) # Or (df.values**2).sum(0)

For further performance boost, there's np.einsum -

a = df.values
out = np.einsum('ij,ij->j',a,a)

Runtime test -

In [31]: df = pd.DataFrame(np.random.randint(0,9,(1000,30)))

In [32]: %timeit (df**2).sum(0)
1000 loops, best of 3: 518 µs per loop

In [33]: %timeit (df.values**2).sum(0)
10000 loops, best of 3: 40.2 µs per loop

In [34]: def einsum_based(a):
    ...:     a = df.values
    ...:     return np.einsum('ij,ij->j',a,a)
    ...: 

In [35]: %timeit einsum_based(a)
10000 loops, best of 3: 32.2 µs per loop
like image 167
Divakar Avatar answered Mar 21 '23 03:03

Divakar