Here is my code to generate a dataframe:
import pandas as pd import numpy as np dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))
then I got the dataframe:
+------------+---------+--------+ | | A | B | +------------+---------+--------- | 0 | 0.626386| 1.52325| +------------+---------+--------+
When I type the commmand :
dff.mean(axis=1)
I got :
0 1.074821 dtype: float64
According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be
A 0.626386 B 1.523255 dtype: float64
So here is my question: what does axis in pandas mean?
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1).
Use of axis Parameter in Pandas Methods The axis parameter specifies the direction along which a particular method or function is applied in a DataFrame. axis=0 represents the function is applied column-wise, and axis=1 means that the function is applied row-wise on the DataFrame.
The drop() method removes the specified row or column. By specifying the column axis ( axis='columns' ), the drop() method removes the specified column. By specifying the row axis ( axis='index' ), the drop() method removes the specified row.
The parameter axis=1 refer to columns, while 0 refers to rows. In this case you are sorting by columns, specifically index 1, which is col2 (indexing in python starts at 0).
It specifies the axis along which the means are computed. By default axis=0
. This is consistent with the numpy.mean
usage when axis
is specified explicitly (in numpy.mean
, axis==None by default, which computes the mean value over the flattened array) , in which axis=0
along the rows (namely, index in pandas), and axis=1
along the columns. For added clarity, one may choose to specify axis='index'
(instead of axis=0
) or axis='columns'
(instead of axis=1
).
+------------+---------+--------+ | | A | B | +------------+---------+--------- | 0 | 0.626386| 1.52325|----axis=1-----> +------------+---------+--------+ | | | axis=0 | ↓ ↓
These answers do help explain this, but it still isn't perfectly intuitive for a non-programmer (i.e. someone like me who is learning Python for the first time in context of data science coursework). I still find using the terms "along" or "for each" wrt to rows and columns to be confusing.
What makes more sense to me is to say it this way:
So a mean on axis 0 will be the mean of all the rows in each column, and a mean on axis 1 will be a mean of all the columns in each row.
Ultimately this is saying the same thing as @zhangxaochen and @Michael, but in a way that is easier for me to internalize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With