Python

Question

I'm new in Python and I'm trying to get the average of every (column or row) of a csv file for then select the values that are higher than the double of the average of its column (o row). My file have hundreds of columns, and have float values like these:

845.123,452.234,653.23,...
432.123,213.452.421.532,...
743.234,532,432.423,...

I've tried several changes to my code to get the average for every column (separately), but at the moment my code is like this one:

def AverageColumn (c):
    f=open(csv,"r")
    average=0
    Sum=0
    column=len(f)
    for i in range(0,column):
        for n in i.split(','):
            n=float(n)
            Sum += n
        average = Sum / len(column)
    return 'The average is:', average

    f.close()


csv="MDT25.csv"
print AverageColumn(csv)

But I always get a error like " f has no len()" or "'int' object is not iterable"...

I'd really appreciate if someone show me how to get the average for every column (or row, as you want), and then select the values that are higher than the double of the average of its column (or row). I'd rather without importing modules as csv, but as you prefer. Thanks!

monkut · Accepted Answer

Here's a clean up of your function, but it probably doesn't do what you want it to do. Currently, it is getting the average of all values in all columns:

def average_column (csv):
    f = open(csv,"r")
    average = 0
    Sum = 0
    row_count = 0
    for row in f:
        for column in row.split(','):
            n=float(column)
            Sum += n
        row_count += 1
    average = Sum / len(column)
    f.close()
    return 'The average is:', average

I would use the csv module (which makes csv parsing easier), with a Counter object to manage the column totals and a context manager to open the file (no need for a close()):

import csv
from collections import Counter

def average_column (csv_filepath):
    column_totals = Counter()
    with open(csv_filepath,"rb") as f:
        reader = csv.reader(f)
        row_count = 0.0
        for row in reader:
            for column_idx, column_value in enumerate(row):
                try:
                    n = float(column_value)
                    column_totals[column_idx] += n
                except ValueError:
                    print "Error -- ({}) Column({}) could not be converted to float!".format(column_value, column_idx)                    
            row_count += 1.0            

    # row_count is now 1 too many so decrement it back down
    row_count -= 1.0

    # make sure column index keys are in order
    column_indexes = column_totals.keys()
    column_indexes.sort()

    # calculate per column averages using a list comprehension
    averages = [column_totals[idx]/row_count for idx in column_indexes]
    return averages

Python - Calculate average for every column in a csv file

Tags:

csv

multiple-columns

Pabloo LR

1 Answers

monkut

Recent Activity

Donate For Us