I am reading in a csv file like so:
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
data_list = list(reader)
Here data_list is a list of each line in the csv file. So data_list[0] is the first line in the csv file (i.e the Headers), data_list[1] onwards is the actual lines containing data in csv file and data_line[1:][1] is the time.
So basically
data_list=
[['','Header1','Header2','Header3'],
['12:02:11', '2.3', '6.2', '11.8'],
['12:05:25', '1.5', '7.5', '13.2'],
['12:10:48', '4.1', '6.8', '12.6'],
['12:13:17', '1.6', '7.1', '12.1']]
I want to find the average of each column but excluding the Headers and the time as part of the calculations but keeping the Headers for the output and only taking one decimal place. Overall I want to produce something like this:
average_data_list=
[['','Header1','Header2','Header3'],
['', 2.3', '6.9', '12.4']]
I have been using Python - Calculate average for every column in a csv file as a guide but my code keeps throwing errors as I can't get it to skip the headers and time correctly.
Any help would be much appreciated
The following should work:
import csv
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
header = next(reader)
data_list = list(reader)
rows = [''] + ['{:.1f}'.format(sum(float(x) for x in y) / len(data_list)) for y in zip(*data_list)[1:]]
average_data_list = [header] + [rows]
print average_data_list
This would display:
[['', 'Header1', 'Header2', 'Header3'], ['', '2.4', '6.9', '12.4']]
The trick here is to read the header row first so that it does not get in the way. The zip(*data_list) is used to convert your list of rows to a list of columns so that the average can be easily calculated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With