Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Find the average for each column in a csv file excluding headers and time

Tags:

python

csv

I am reading in a csv file like so:

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    data_list = list(reader)

Here data_list is a list of each line in the csv file. So data_list[0] is the first line in the csv file (i.e the Headers), data_list[1] onwards is the actual lines containing data in csv file and data_line[1:][1] is the time.

So basically

data_list=
[['','Header1','Header2','Header3'],
['12:02:11', '2.3', '6.2', '11.8'],
['12:05:25', '1.5', '7.5', '13.2'],
['12:10:48', '4.1', '6.8', '12.6'],
['12:13:17', '1.6', '7.1', '12.1']]

I want to find the average of each column but excluding the Headers and the time as part of the calculations but keeping the Headers for the output and only taking one decimal place. Overall I want to produce something like this:

average_data_list=
[['','Header1','Header2','Header3'],
['', 2.3', '6.9', '12.4']]

I have been using Python - Calculate average for every column in a csv file as a guide but my code keeps throwing errors as I can't get it to skip the headers and time correctly.

Any help would be much appreciated

like image 613
Catherine Avatar asked Apr 17 '26 09:04

Catherine


1 Answers

The following should work:

import csv

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    header = next(reader)
    data_list = list(reader)
    rows = [''] + ['{:.1f}'.format(sum(float(x) for x in y) / len(data_list)) for y in zip(*data_list)[1:]]
    average_data_list = [header] + [rows]

    print average_data_list

This would display:

[['', 'Header1', 'Header2', 'Header3'], ['', '2.4', '6.9', '12.4']]

The trick here is to read the header row first so that it does not get in the way. The zip(*data_list) is used to convert your list of rows to a list of columns so that the average can be easily calculated.

like image 109
Martin Evans Avatar answered Apr 19 '26 23:04

Martin Evans