I've been very confused about how python axes are defined, and whether they refer to a DataFrame's rows or columns. Consider the code below: <pre class="prettyprint"><code>>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"]) >>> df col1 col2 col3 col4 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 </code></pre> So if we call <code>df.mean(axis=1)</code>, we'll get a mean across the rows: <pre class="prettyprint"><code>>>> df.mean(axis=1) 0 1 1 2 2 3 </code></pre> However, if we call <code>df.drop(name, axis=1)</code>, we actually drop a column, not a row: <pre class="prettyprint"><code>>>> df.drop("col4", axis=1) col1 col2 col3 0 1 1 1 1 2 2 2 2 3 3 3 </code></pre> Can someone help me understand what is meant by an "axis" in pandas/numpy/scipy? A side note, <code>DataFrame.mean</code> just might be defined wrong. It says in the documentation for <code>DataFrame.mean</code> that <code>axis=1</code> is supposed to mean a mean over the columns, not the rows...

It's perhaps simplest to remember it as 0=down and 1=across. This means: <ul> <li>Use <code>axis=0</code> to apply a method down each column, or to the row labels (the index).</li> <li>Use <code>axis=1</code> to apply a method across each row, or to the column labels.</li> </ul> Here's a picture to show the parts of a DataFrame that each axis refers to: <img src="https://i.stack.imgur.com/DL0iQ.jpg" width="410" height="210"> It's also useful to remember that Pandas follows NumPy's use of the word <code>axis</code>. The usage is explained in NumPy's glossary of terms: <blockquote> Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis] </blockquote> So, concerning the method in the question, <code>df.mean(axis=1)</code>, seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, <code>df.mean(axis=0)</code> would be an operation acting vertically downwards across rows. Similarly, <code>df.drop(name, axis=1)</code> refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying <code>axis=0</code> would make the method act on rows instead.

There are already proper answers, but I give you another example with > 2 dimensions. The parameter <code>axis</code> means axis to be changed. For example, consider that there is a dataframe with dimension a x b x c. <ul> <li> <code>df.mean(axis=1)</code> returns a dataframe with dimenstion a x 1 x c. </li> <li> <code>df.drop("col4", axis=1)</code> returns a dataframe with dimension a x (b-1) x c.</li> </ul> Here, <code>axis=1</code> means the second axis which is <code>b</code>, so <code>b</code> value will be changed in these examples.

Ambiguity in Pandas Dataframe / Numpy Array "axis" definition

Tags:

python

arrays

pandas

dataframe

numpy

I've been very confused about how python axes are defined, and whether they refer to a DataFrame's rows or columns. Consider the code below:

>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"]) >>> df    col1  col2  col3  col4 0     1     1     1     1 1     2     2     2     2 2     3     3     3     3

So if we call df.mean(axis=1), we'll get a mean across the rows:

>>> df.mean(axis=1) 0    1 1    2 2    3

However, if we call df.drop(name, axis=1), we actually drop a column, not a row:

>>> df.drop("col4", axis=1)    col1  col2  col3 0     1     1     1 1     2     2     2 2     3     3     3

Can someone help me understand what is meant by an "axis" in pandas/numpy/scipy?

A side note, DataFrame.mean just might be defined wrong. It says in the documentation for DataFrame.mean that axis=1 is supposed to mean a mean over the columns, not the rows...

986

asked Sep 10 '14 19:09

hlin117

2 Answers

It's perhaps simplest to remember it as 0=down and 1=across.

This means:

Use axis=0 to apply a method down each column, or to the row labels (the index).
Use axis=1 to apply a method across each row, or to the column labels.

Here's a picture to show the parts of a DataFrame that each axis refers to:

It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]

So, concerning the method in the question, df.mean(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0) would be an operation acting vertically downwards across rows.

Similarly, df.drop(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.

192

answered Oct 10 '22 19:10

Alex Riley

There are already proper answers, but I give you another example with > 2 dimensions.

The parameter axis means axis to be changed.
For example, consider that there is a dataframe with dimension a x b x c.

df.mean(axis=1) returns a dataframe with dimenstion a x 1 x c.
df.drop("col4", axis=1) returns a dataframe with dimension a x (b-1) x c.

Here, axis=1 means the second axis which is b, so b value will be changed in these examples.

answered Oct 10 '22 18:10

jeongmin.cha

Related questions
                            
                                UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples
                            
                                Row and column headers in matplotlib's subplots
                            
                                Python argparse mutual exclusive group
                            
                                FileNotFoundError: [Errno 2] No such file or directory [duplicate]
                            
                                Passing IPython variables as arguments to bash commands
                            
                                What is this odd sorting algorithm?
                            
                                How can I use redis with Django?
                            
                                prevent scientific notation in matplotlib.pyplot [duplicate]
                            
                                Block scope in Python
                            
                                How do I sort unicode strings alphabetically in Python?
                            
                                In Python, how to display current time in readable format
                            
                                Pandas: how to change all the values of a column?
                            
                                Set Django's FileField to an existing file
                            
                                List of dicts to/from dict of lists
                            
                                Defining the midpoint of a colormap in matplotlib
                            
                                Can I make an admin field not required in Django without creating a form?
                            
                                Python's lambda with underscore for an argument?
                            
                                Declare function at end of file in Python
                            
                                matplotlib y-axis label on right side
                            
                                Scatter plot and Color mapping in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With