Here is my code to generate a dataframe: <pre class="prettyprint"><code>import pandas as pd import numpy as np dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB')) </code></pre> then I got the dataframe: <pre class="prettyprint"><code>+------------+---------+--------+ | | A | B | +------------+---------+--------- | 0 | 0.626386| 1.52325| +------------+---------+--------+ </code></pre> When I type the commmand : <pre class="prettyprint"><code>dff.mean(axis=1) </code></pre> I got : <pre class="prettyprint"><code>0 1.074821 dtype: float64 </code></pre> According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be <pre class="prettyprint"><code>A 0.626386 B 1.523255 dtype: float64 </code></pre> So here is my question: what does axis in pandas mean?

It specifies the axis along which the means are computed. By default <code>axis=0</code>. This is consistent with the <code>numpy.mean</code> usage when <code>axis</code> is specified explicitly (in <code>numpy.mean</code>, axis==None by default, which computes the mean value over the flattened array) , in which <code>axis=0</code> along the rows (namely, index in pandas), and <code>axis=1</code> along the columns. For added clarity, one may choose to specify <code>axis='index'</code> (instead of <code>axis=0</code>) or <code>axis='columns'</code> (instead of <code>axis=1</code>). <pre class="prettyprint"><code>+------------+---------+--------+ | | A | B | +------------+---------+--------- | 0 | 0.626386| 1.52325|----axis=1-----> +------------+---------+--------+ | | | axis=0 | ↓ ↓ </code></pre>

These answers do help explain this, but it still isn't perfectly intuitive for a non-programmer (i.e. someone like me who is learning Python for the first time in context of data science coursework). I still find using the terms "along" or "for each" wrt to rows and columns to be confusing. What makes more sense to me is to say it this way: <ul> <li>Axis 0 will act on all the ROWS in each COLUMN </li> <li>Axis 1 will act on all the COLUMNS in each ROW</li> </ul> So a mean on axis 0 will be the mean of all the rows in each column, and a mean on axis 1 will be a mean of all the columns in each row. Ultimately this is saying the same thing as @zhangxaochen and @Michael, but in a way that is easier for me to internalize.

What does axis in pandas mean?

Tags:

python

pandas

dataframe

numpy

Here is my code to generate a dataframe:

import pandas as pd import numpy as np  dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))

then I got the dataframe:

+------------+---------+--------+ |            |  A      |  B     | +------------+---------+--------- |      0     | 0.626386| 1.52325| +------------+---------+--------+

When I type the commmand :

dff.mean(axis=1)

I got :

0    1.074821 dtype: float64

According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be

A    0.626386 B    1.523255 dtype: float64

So here is my question: what does axis in pandas mean?

664

asked Mar 03 '14 14:03

jerry_sjtu

2 Answers

It specifies the axis along which the means are computed. By default axis=0. This is consistent with the numpy.mean usage when axis is specified explicitly (in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0 along the rows (namely, index in pandas), and axis=1 along the columns. For added clarity, one may choose to specify axis='index' (instead of axis=0) or axis='columns' (instead of axis=1).

+------------+---------+--------+ |            |  A      |  B     | +------------+---------+--------- |      0     | 0.626386| 1.52325|----axis=1-----> +------------+---------+--------+              |         |              | axis=0  |              ↓         ↓

154

answered Oct 01 '22 08:10

zhangxaochen

These answers do help explain this, but it still isn't perfectly intuitive for a non-programmer (i.e. someone like me who is learning Python for the first time in context of data science coursework). I still find using the terms "along" or "for each" wrt to rows and columns to be confusing.

What makes more sense to me is to say it this way:

Axis 0 will act on all the ROWS in each COLUMN
Axis 1 will act on all the COLUMNS in each ROW

So a mean on axis 0 will be the mean of all the rows in each column, and a mean on axis 1 will be a mean of all the columns in each row.

Ultimately this is saying the same thing as @zhangxaochen and @Michael, but in a way that is easier for me to internalize.

answered Oct 01 '22 08:10

Ken Wallace

Related questions
                            
                                In Python, if I return inside a "with" block, will the file still close?
                            
                                How to test if a dictionary contains a specific key? [duplicate]
                            
                                Python idiom to return first item or None
                            
                                Pandas index column title or name
                            
                                Loop backwards using indices in Python?
                            
                                "pip install unroll": "python setup.py egg_info" failed with error code 1
                            
                                How to use filter, map, and reduce in Python 3
                            
                                What does asterisk * mean in Python? [duplicate]
                            
                                Get the row(s) which have the max value in groups using groupby
                            
                                Is it possible only to declare a variable without assigning any value in Python?
                            
                                Python strftime - date without leading 0?
                            
                                How to start a background process in Python?
                            
                                Join a list of items with different types as string in Python
                            
                                How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?
                            
                                Normalize columns of pandas data frame
                            
                                Total memory used by Python process?
                            
                                Convert a python dict to a string and back
                            
                                Finding and replacing elements in a list
                            
                                Django Model() vs Model.objects.create()
                            
                                Bare asterisk in function arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With