I'm new in Python and I'm trying to get the average of every (column or row) of a csv file for then select the values that are higher than the double of the average of its column (o row). My file have hundreds of columns, and have float values like these:
845.123,452.234,653.23,...
432.123,213.452.421.532,...
743.234,532,432.423,...
I've tried several changes to my code to get the average for every column (separately), but at the moment my code is like this one:
def AverageColumn (c):
f=open(csv,"r")
average=0
Sum=0
column=len(f)
for i in range(0,column):
for n in i.split(','):
n=float(n)
Sum += n
average = Sum / len(column)
return 'The average is:', average
f.close()
csv="MDT25.csv"
print AverageColumn(csv)
But I always get a error like " f has no len()" or "'int' object is not iterable"...
I'd really appreciate if someone show me how to get the average for every column (or row, as you want), and then select the values that are higher than the double of the average of its column (or row). I'd rather without importing modules as csv, but as you prefer. Thanks!
Here's a clean up of your function, but it probably doesn't do what you want it to do. Currently, it is getting the average of all values in all columns:
def average_column (csv):
f = open(csv,"r")
average = 0
Sum = 0
row_count = 0
for row in f:
for column in row.split(','):
n=float(column)
Sum += n
row_count += 1
average = Sum / len(column)
f.close()
return 'The average is:', average
I would use the csv
module (which makes csv parsing easier), with a Counter
object to manage the column totals and a context manager to open the file (no need for a close()
):
import csv
from collections import Counter
def average_column (csv_filepath):
column_totals = Counter()
with open(csv_filepath,"rb") as f:
reader = csv.reader(f)
row_count = 0.0
for row in reader:
for column_idx, column_value in enumerate(row):
try:
n = float(column_value)
column_totals[column_idx] += n
except ValueError:
print "Error -- ({}) Column({}) could not be converted to float!".format(column_value, column_idx)
row_count += 1.0
# row_count is now 1 too many so decrement it back down
row_count -= 1.0
# make sure column index keys are in order
column_indexes = column_totals.keys()
column_indexes.sort()
# calculate per column averages using a list comprehension
averages = [column_totals[idx]/row_count for idx in column_indexes]
return averages
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With