I am learning <code>Pandas</code> package by replicating the outing from some of the R vignettes. Now I am using the <code>dplyr</code> package from R as an example: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html <h3>R script</h3> <pre class="prettyprint"><code>planes <- group_by(hflights_df, TailNum) delay <- summarise(planes, count = n(), dist = mean(Distance, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000) </code></pre> <h3>Python script</h3> <pre class="prettyprint"><code>planes = hflights.groupby('TailNum') planes['Distance'].agg({'count' : 'count', 'dist' : 'mean'}) </code></pre> How can I state explicitly in python that <code>NA</code> needs to be skipped?

That's a trick question, since you don't do that. Pandas will automatically exclude <code>NaN</code> numbers from aggregation functions. Consider my <code>df</code>: <pre class="prettyprint"><code> b c d e a 2 2 6 1 3 2 4 8 NaN 7 2 4 4 6 3 3 5 NaN 2 6 4 NaN NaN 4 1 5 6 2 1 8 7 3 2 4 7 9 6 1 NaN 1 9 NaN NaN 9 3 9 3 4 6 1 </code></pre> The internal <code>count()</code> function will ignore <code>NaN</code> values, and so will <code>mean()</code>. The only point where we get <code>NaN</code>, is when the only value is <code>NaN</code>. Then, we take the mean value of an empty set, which turns out to be <code>NaN</code>: <pre class="prettyprint"><code>In[335]: df.groupby('a').mean() Out[333]: b c d e a 2 3.333333 6.0 3.5 4.333333 3 5.000000 NaN 2.0 6.000000 4 NaN NaN 4.0 1.000000 5 6.000000 2.0 1.0 8.000000 7 3.000000 2.0 4.0 7.000000 9 4.500000 2.5 7.5 1.666667 </code></pre> Aggregate functions work in the same way: <pre class="prettyprint"><code>In[340]: df.groupby('a')['b'].agg({'foo': np.mean}) Out[338]: foo a 2 3.333333 3 5.000000 4 NaN 5 6.000000 7 3.000000 9 4.500000 </code></pre> Addendum: Notice how the standard dataframe.mean API will allow you to control inclusion of <code>NaN</code> values, where the default is exclusion.

What foobar said is true in regards to how it was implemented by default, but there is a very easy way to specify skipna. Here is an exemple that speaks for itself: <pre class="prettyprint"><code>def custom_mean(df): return df.mean(skipna=False) group.agg({"your_col_name_to_be_aggregated":custom_mean}) </code></pre> That's it! You can customize your own aggregation the way you want, and I'd expect this to be fairly efficient, but I did not dig into it. It was also discussed here, but I thought I'd help spread the good news! Answer was found in the official doc.

specifying "skip NA" when calculating mean of the column in a data frame created by Pandas

I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example:

http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

R script

planes <- group_by(hflights_df, TailNum)
delay <- summarise(planes,
  count = n(),
  dist = mean(Distance, na.rm = TRUE))
delay <- filter(delay, count > 20, dist < 2000)

Python script

planes = hflights.groupby('TailNum')
planes['Distance'].agg({'count' : 'count',
                        'dist' : 'mean'})

How can I state explicitly in python that NA needs to be skipped?

Does mean in pandas ignore NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

How do I skip NaN in pandas?

Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.

How do you skip a column in a data frame?

You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df. columns!='

How can calculate mean of pandas column?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

That's a trick question, since you don't do that. Pandas will automatically exclude NaN numbers from aggregation functions. Consider my df:

    b   c   d  e
a               
2   2   6   1  3
2   4   8 NaN  7
2   4   4   6  3
3   5 NaN   2  6
4 NaN NaN   4  1
5   6   2   1  8
7   3   2   4  7
9   6   1 NaN  1
9 NaN NaN   9  3
9   3   4   6  1

The internal count() function will ignore NaN values, and so will mean(). The only point where we get NaN, is when the only value is NaN. Then, we take the mean value of an empty set, which turns out to be NaN:

In[335]: df.groupby('a').mean()
Out[333]: 
          b    c    d         e
a                              
2  3.333333  6.0  3.5  4.333333
3  5.000000  NaN  2.0  6.000000
4       NaN  NaN  4.0  1.000000
5  6.000000  2.0  1.0  8.000000
7  3.000000  2.0  4.0  7.000000
9  4.500000  2.5  7.5  1.666667

Aggregate functions work in the same way:

In[340]: df.groupby('a')['b'].agg({'foo': np.mean})
Out[338]: 
        foo
a          
2  3.333333
3  5.000000
4       NaN
5  6.000000
7  3.000000
9  4.500000

Addendum: Notice how the standard dataframe.mean API will allow you to control inclusion of NaN values, where the default is exclusion.

What foobar said is true in regards to how it was implemented by default, but there is a very easy way to specify skipna. Here is an exemple that speaks for itself:

def custom_mean(df):
    return df.mean(skipna=False)

group.agg({"your_col_name_to_be_aggregated":custom_mean})

That's it! You can customize your own aggregation the way you want, and I'd expect this to be fairly efficient, but I did not dig into it.

It was also discussed here, but I thought I'd help spread the good news! Answer was found in the official doc.

specifying "skip NA" when calculating mean of the column in a data frame created by Pandas

Tags:

python

pandas

r

na

R script

Python script

lokheart

People also ask

2 Answers

FooBar

c-a

Recent Activity

Donate For Us

specifying "skip NA" when calculating mean of the column in a data frame created by Pandas

Tags:

python

pandas

r

na

R script

Python script

lokheart

People also ask

2 Answers

FooBar

c-a

Related questions

Recent Activity

Donate For Us