Python

Question

I am reading in a csv file like so:

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    data_list = list(reader)

Here data_list is a list of each line in the csv file. So data_list[0] is the first line in the csv file (i.e the Headers), data_list[1] onwards is the actual lines containing data in csv file and data_line[1:][1] is the time.

So basically

data_list=
[['','Header1','Header2','Header3'],
['12:02:11', '2.3', '6.2', '11.8'],
['12:05:25', '1.5', '7.5', '13.2'],
['12:10:48', '4.1', '6.8', '12.6'],
['12:13:17', '1.6', '7.1', '12.1']]

I want to find the average of each column but excluding the Headers and the time as part of the calculations but keeping the Headers for the output and only taking one decimal place. Overall I want to produce something like this:

average_data_list=
[['','Header1','Header2','Header3'],
['', 2.3', '6.9', '12.4']]

I have been using Python - Calculate average for every column in a csv file as a guide but my code keeps throwing errors as I can't get it to skip the headers and time correctly.

Any help would be much appreciated

Martin Evans · Accepted Answer

The following should work:

import csv

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    header = next(reader)
    data_list = list(reader)
    rows = [''] + ['{:.1f}'.format(sum(float(x) for x in y) / len(data_list)) for y in zip(*data_list)[1:]]
    average_data_list = [header] + [rows]

    print average_data_list

This would display:

[['', 'Header1', 'Header2', 'Header3'], ['', '2.4', '6.9', '12.4']]

The trick here is to read the header row first so that it does not get in the way. The zip(*data_list) is used to convert your list of rows to a list of columns so that the average can be easily calculated.

Python - Find the average for each column in a csv file excluding headers and time

Tags:

csv

Catherine

1 Answers

Martin Evans

Recent Activity

Donate For Us