I have a csv file that has 25000 rows. I want to put the average of every 30 rows in another csv file.
I've given an example with 9 rows as below and the new csv file has 3 rows (3, 1, 2):
| H |
========
| 1 |---\
| 3 | |--->| 3 |
| 5 |---/
| -1 |---\
| 3 | |--->| 1 |
| 1 |---/
| 0 |---\
| 5 | |--->| 2 |
| 1 |---/
What I did:
import numpy as np
import pandas as pd
m_path = "file.csv"
m_df = pd.read_csv(m_path, usecols=['Col-01'])
m_arr = np.array([])
temp = m_df.to_numpy()
step = 30
for i in range(1, 25000, step):
arr = np.append(m_arr,np.array([np.average(temp[i:i + step])]))
data = np.array(m_arr)[np.newaxis]
m_df = pd.DataFrame({'Column1': data[0, :]})
m_df.to_csv('AVG.csv')
This works well but Is there any other option to do this?
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
How to find the mean row wise in Pandas? It returns the mean for each row with axis=1 . Note that the pandas mean() function calculates the mean for columns and not rows by default. Thus, make sure to pass 1 to the axis parameter if you want the get the average for each row.
You can use integer division by step
for consecutive groups and pass to groupby
for aggregate mean
:
step = 30
m_df = pd.read_csv(m_path, usecols=['Col-01'])
df = m_df.groupby(m_df.index // step).mean()
Or:
df = m_df.groupby(np.arange(len(dfm_df// step).mean()
Sample data:
step = 3
df = m_df.groupby(m_df.index // step).mean()
print (df)
H
0 3
1 1
2 2
You can get rolling mean using DataFrame.rolling
and then filter result using slicing
df.rolling(3).mean()[2::3].reset_index(drop=True)
a
0 3.0
1 1.0
2 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With