Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does axis in pandas mean?

Here is my code to generate a dataframe:

import pandas as pd import numpy as np  dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB')) 

then I got the dataframe:

+------------+---------+--------+ |            |  A      |  B     | +------------+---------+--------- |      0     | 0.626386| 1.52325| +------------+---------+--------+ 

When I type the commmand :

dff.mean(axis=1) 

I got :

0    1.074821 dtype: float64 

According to the reference of pandas, axis=1 stands for columns and I expect the result of the command to be

A    0.626386 B    1.523255 dtype: float64 

So here is my question: what does axis in pandas mean?

like image 664
jerry_sjtu Avatar asked Mar 03 '14 14:03

jerry_sjtu


People also ask

What does axis mean in Python?

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1).

How do you read the axis of pandas?

Use of axis Parameter in Pandas Methods The axis parameter specifies the direction along which a particular method or function is applied in a DataFrame. axis=0 represents the function is applied column-wise, and axis=1 means that the function is applied row-wise on the DataFrame.

What is Axis in drop in pandas?

The drop() method removes the specified row or column. By specifying the column axis ( axis='columns' ), the drop() method removes the specified column. By specifying the row axis ( axis='index' ), the drop() method removes the specified row.

What is the mean of Axis 1 in following snippet?

The parameter axis=1 refer to columns, while 0 refers to rows. In this case you are sorting by columns, specifically index 1, which is col2 (indexing in python starts at 0).


2 Answers

It specifies the axis along which the means are computed. By default axis=0. This is consistent with the numpy.mean usage when axis is specified explicitly (in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0 along the rows (namely, index in pandas), and axis=1 along the columns. For added clarity, one may choose to specify axis='index' (instead of axis=0) or axis='columns' (instead of axis=1).

+------------+---------+--------+ |            |  A      |  B     | +------------+---------+--------- |      0     | 0.626386| 1.52325|----axis=1-----> +------------+---------+--------+              |         |              | axis=0  |              ↓         ↓ 
like image 154
zhangxaochen Avatar answered Oct 01 '22 08:10

zhangxaochen


These answers do help explain this, but it still isn't perfectly intuitive for a non-programmer (i.e. someone like me who is learning Python for the first time in context of data science coursework). I still find using the terms "along" or "for each" wrt to rows and columns to be confusing.

What makes more sense to me is to say it this way:

  • Axis 0 will act on all the ROWS in each COLUMN
  • Axis 1 will act on all the COLUMNS in each ROW

So a mean on axis 0 will be the mean of all the rows in each column, and a mean on axis 1 will be a mean of all the columns in each row.

Ultimately this is saying the same thing as @zhangxaochen and @Michael, but in a way that is easier for me to internalize.

like image 28
Ken Wallace Avatar answered Oct 01 '22 08:10

Ken Wallace