Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying RMS formula over three columns pandas

I am trying to apply a RMS function for Accelero-meter data which is in 3 dimension. Also, I have a time stamp column at the beginning which I have kept in days count. So the dataframe is as follows:

       0        1       2      3
0   1.963   -12.0   -71.0   -2.0
1   1.963   -11.0   -71.0   -3.0
2   1.963   -14.0   -67.0   -6.0
3   1.963   -16.0   -63.0   -7.0
4   1.963   -18.0   -60.0   -8.0

column '0' is Days, and all the other columns are the 3-axis data of accelero-meter. Right now I am using this approach to compute the RMS value to a new column and drop the existing 3-axis data :

def rms_detrend(x):
    return np.sqrt(np.mean(x[1]**2 + x[2]**2 + x[3]**2))

accdf =pd.read_csv(ACC_files[1],header=None)
accdf['ACC_RMS'] = accdf.apply(rms_detrend,axis=1)
accdf = accdf.drop([1,2,3],axis=1)
accdf.columns = accdf['Days','ACC_RMS']

However, I have 70 such files of Accelerometer data each with about 4000+ rows. So is there a better and quicker(pythonic) way to do this ? Thanks. The code above I have done for just one file and its very slow.

like image 304
lamo_738 Avatar asked Oct 15 '25 22:10

lamo_738


2 Answers

Use:

accdf['ACC_RMS'] = np.sqrt(accdf.pop(1)**2 + accdf.pop(2)**2 + accdf.pop(3)**2)
print (accdf)
       0    ACC_RMS
0  1.963  72.034714
1  1.963  71.909666
2  1.963  68.709534
3  1.963  65.375837
4  1.963  63.150614

Numpy solution for improve performance:

#[50000 rows x 4 columns]
accdf = pd.concat([accdf] * 10000, ignore_index=True)

In [27]: %timeit (accdf.iloc[:,1:]**2).sum(1).pow(1/2)
1.97 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [28]: %timeit np.sqrt(np.sum(accdf.to_numpy()[:,1:]**2, axis=1))
202 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Unfortunately my solution return error for testing, but I guess it is slowier like numpy only solution.

like image 179
jezrael Avatar answered Oct 18 '25 16:10

jezrael


A method from pandas

(df.iloc[:,1:]**2).sum(1).pow(1/2)
Out[26]: 
0    72.034714
1    71.909666
2    68.709534
3    65.375837
4    63.150614
dtype: float64
like image 34
BENY Avatar answered Oct 18 '25 14:10

BENY