Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NaN results with pandas mean function

Tags:

python

pandas

I try to have the mean of a row in my Python DataFrame, but I get a NaN return for every row. Why I get this result and how can I solve that ?

Goog key ratios : http://www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path, skiprows = 2, names = ['Y0','Y1','Y2','Y3','Y4','Y5','Y6','Y7','Y8','Y9','Y10'], index_col = 0)
noTTM = data.iloc[:,0:10]
print(data.mean(1))
grossMargin = noTTM[2:3]
print(grossMargin.mean(1))

Return:

Gross Margin %   NaN
dtype: float64

Regards,

like image 618
abuteau Avatar asked Aug 11 '13 16:08

abuteau


People also ask

Why is pandas mean returning NaN?

Mean Parameters axis = You can choose if you want to take the average across columns (axis='index' or 0) or rows (axis='columns' or 1). skipna (Default: True) = Exclude the NA/null values when computing the result. If you set skipna=False and there is an NA in your data, pandas will return “NaN” for your average.

Does mean in pandas include NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

How do you replace all NaN values with mean in pandas?

You can use the fillna() function to replace NaN values in a pandas DataFrame.


1 Answers

The reason you have a bunch of nan values is because you don't have homogeneous column types. So, for example when you try to average across the columns it doesn't make sense because pandas.read_csv will only convert into a numeric column if it makes sense, e.g., you don't have string dates or other text in the same column as numbers.

I recommend also that you do a simple df.head() to check your data before doing even simple analyses. It will save you a lot of time in the future when you're wondering why your output is "weird".

That said, you can do the following to convert things to numeric values, but this isn't necessarily guaranteed to make sense:

In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])

In [36]: df.head() # not homogeneously typed columns
Out[36]:
                               Y0       Y1       Y2       Y3       Y4  \
NaN                       2003-12  2004-12  2005-12  2006-12  2007-12
Revenue USD Mil             1,466    3,189    6,139   10,605   16,594
Gross Margin %               57.3     54.3     58.1     60.2     59.9
Operating Income USD Mil      342      640    2,017    3,550    5,084
Operating Margin %           23.4     20.1     32.9     33.5     30.6

                               Y5       Y6       Y7       Y8       Y9     Y10
NaN                       2008-12  2009-12  2010-12  2011-12  2012-12     TTM
Revenue USD Mil            21,796   23,651   29,321   37,905   50,175  55,797
Gross Margin %               60.4     62.6     64.5     65.2     58.9    56.7
Operating Income USD Mil    6,632    8,312   10,381   11,742   12,760  12,734
Operating Margin %           30.4     35.1     35.4     31.0     25.4    22.8

In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
                             Y0     Y1    Y2    Y3    Y4    Y5    Y6    Y7    Y8    Y9   Y10
NaN                         NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Revenue USD Mil             NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Gross Margin %             57.3   54.3  58.1  60.2  59.9  60.4  62.6  64.5  65.2  58.9  56.7
Operating Income USD Mil  342.0  640.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Operating Margin %         23.4   20.1  32.9  33.5  30.6  30.4  35.1  35.4  31.0  25.4  22.8
like image 67
Phillip Cloud Avatar answered Oct 19 '22 23:10

Phillip Cloud