Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate average of every n rows from a csv file

Tags:

python

pandas

I have a csv file that has 25000 rows. I want to put the average of every 30 rows in another csv file.

I've given an example with 9 rows as below and the new csv file has 3 rows (3, 1, 2):

|   H    |
 ========
|   1    |---\
|   3    |   |--->| 3 |
|   5    |---/
|  -1    |---\
|   3    |   |--->| 1 |
|   1    |---/
|   0    |---\
|   5    |   |--->| 2 |
|   1    |---/

What I did:

import numpy as np
import pandas as pd

m_path = "file.csv"

m_df = pd.read_csv(m_path, usecols=['Col-01']) 
m_arr =  np.array([])
temp = m_df.to_numpy()
step = 30
for i in range(1, 25000, step):
    arr = np.append(m_arr,np.array([np.average(temp[i:i + step])]))

data = np.array(m_arr)[np.newaxis]

m_df = pd.DataFrame({'Column1': data[0, :]})
m_df.to_csv('AVG.csv')

This works well but Is there any other option to do this?

like image 809
Saeed Avatar asked Mar 25 '20 14:03

Saeed


People also ask

How do I get the average of a column in Python CSV?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you find the average of each row in pandas?

How to find the mean row wise in Pandas? It returns the mean for each row with axis=1 . Note that the pandas mean() function calculates the mean for columns and not rows by default. Thus, make sure to pass 1 to the axis parameter if you want the get the average for each row.


2 Answers

You can use integer division by step for consecutive groups and pass to groupby for aggregate mean:

step = 30
m_df = pd.read_csv(m_path, usecols=['Col-01']) 
df = m_df.groupby(m_df.index // step).mean()

Or:

df = m_df.groupby(np.arange(len(dfm_df// step).mean()

Sample data:

step = 3
df = m_df.groupby(m_df.index // step).mean()
print (df)
   H
0  3
1  1
2  2
like image 189
jezrael Avatar answered Sep 17 '22 01:09

jezrael


You can get rolling mean using DataFrame.rolling and then filter result using slicing

df.rolling(3).mean()[2::3].reset_index(drop=True)
     a
0  3.0
1  1.0
2  2.0
like image 24
Dishin H Goyani Avatar answered Sep 18 '22 01:09

Dishin H Goyani