I try to have the mean of a row in my Python DataFrame, but I get a NaN return for every row. Why I get this result and how can I solve that ?
Goog key ratios : http://www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path, skiprows = 2, names = ['Y0','Y1','Y2','Y3','Y4','Y5','Y6','Y7','Y8','Y9','Y10'], index_col = 0)
noTTM = data.iloc[:,0:10]
print(data.mean(1))
grossMargin = noTTM[2:3]
print(grossMargin.mean(1))
Return:
Gross Margin % NaN
dtype: float64
Regards,
Mean Parameters axis = You can choose if you want to take the average across columns (axis='index' or 0) or rows (axis='columns' or 1). skipna (Default: True) = Exclude the NA/null values when computing the result. If you set skipna=False and there is an NA in your data, pandas will return “NaN” for your average.
pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.
You can use the fillna() function to replace NaN values in a pandas DataFrame.
The reason you have a bunch of nan
values is because you don't have homogeneous column types. So, for example when you try to average across the columns it doesn't make sense because pandas.read_csv
will only convert into a numeric column if it makes sense, e.g., you don't have string dates or other text in the same column as numbers.
I recommend also that you do a simple df.head()
to check your data before doing even simple analyses. It will save you a lot of time in the future when you're wondering why your output is "weird".
That said, you can do the following to convert things to numeric values, but this isn't necessarily guaranteed to make sense:
In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])
In [36]: df.head() # not homogeneously typed columns
Out[36]:
Y0 Y1 Y2 Y3 Y4 \
NaN 2003-12 2004-12 2005-12 2006-12 2007-12
Revenue USD Mil 1,466 3,189 6,139 10,605 16,594
Gross Margin % 57.3 54.3 58.1 60.2 59.9
Operating Income USD Mil 342 640 2,017 3,550 5,084
Operating Margin % 23.4 20.1 32.9 33.5 30.6
Y5 Y6 Y7 Y8 Y9 Y10
NaN 2008-12 2009-12 2010-12 2011-12 2012-12 TTM
Revenue USD Mil 21,796 23,651 29,321 37,905 50,175 55,797
Gross Margin % 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 6,632 8,312 10,381 11,742 12,760 12,734
Operating Margin % 30.4 35.1 35.4 31.0 25.4 22.8
In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Revenue USD Mil NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Gross Margin % 57.3 54.3 58.1 60.2 59.9 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 342.0 640.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operating Margin % 23.4 20.1 32.9 33.5 30.6 30.4 35.1 35.4 31.0 25.4 22.8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With