How do numpy functions operate on pandas objects internally?

Tags:

Numpy functions, eg np.mean(), np.var(), etc, accept an array-like argument, like np.array, or list, etc.

But passing in a pandas dataframe also works. This means that a pandas dataframe can indeed disguise itself as a numpy array, which I find a little strange (despite knowing the fact that the underlying values of a df are indeed numpy arrays).

For an object to be an array-like, I thought that it should be slicable using integer indexing in the way a numpy array is sliced. So for instance df[1:3, 2:3] should work, but it would lead to an error.

So, possibly a dataframe gets converted into a numpy array when it goes inside the function. But if that is the case then why does np.mean(numpy_array) lead to a different result than that of np.mean(df)?

a = np.random.rand(4,2)
a
Out[13]: 
array([[ 0.86688862,  0.09682919],
   [ 0.49629578,  0.78263523],
   [ 0.83552411,  0.71907931],
   [ 0.95039642,  0.71795655]])

np.mean(a)
Out[14]: 0.68320065182041034

gives a different result than what the below gives...

df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df
Out[18]: 
      0         1
0  0.866889  0.096829
1  0.496296  0.782635
2  0.835524  0.719079
3  0.950396  0.717957

np.mean(df)
Out[21]: 
0    0.787276
1    0.579125
dtype: float64

The former output is a single number, whereas the latter is a column-wise mean. How does a numpy function know about the make of a dataframe?

437

asked May 09 '17 09:05

a-a

1 Answers

If you step through this:

--Call--
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2796)mean()
-> def mean(a, axis=None, dtype=None, out=None, keepdims=False):
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2877)mean()
-> if type(a) is not mu.ndarray:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2878)mean()
-> try:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2879)mean()
-> mean = a.mean

You can see that the type is not a ndarray so it tries to call a.mean which in this case would be df.mean():

In [6]:

df.mean()
Out[6]:
0    0.572999
1    0.468268
dtype: float64

This is why the output is different

Code to reproduce above:

In [3]:
a = np.random.rand(4,2)
a

Out[3]:
array([[ 0.96750329,  0.67623187],
       [ 0.44025179,  0.97312747],
       [ 0.07330062,  0.18341157],
       [ 0.81094166,  0.04030253]])

In [4]:    
np.mean(a)

Out[4]:
0.52063384885403818

In [5]:    
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df

Out[5]:
          0         1
0  0.967503  0.676232
1  0.440252  0.973127
2  0.073301  0.183412
3  0.810942  0.040303

numpy output:

In [7]:
np.mean(df)

Out[7]:
0    0.572999
1    0.468268
dtype: float64

If you'd called .values to return a np array then the output is the same:

In [8]:
np.mean(df.values)

Out[8]:
0.52063384885403818

196

answered Sep 21 '22 05:09

EdChum

Related questions
                            
                                Using numpy.take for faster fancy indexing
                            
                                Python library for creating tree graphs out of nested Python objects (dicts)
                            
                                Why does PIP convert underscores to dashes
                            
                                Can one upload files using Python SimpleHTTPServer or cgi?
                            
                                How to prevent adding two arrays by broadcasting in numpy?
                            
                                Efficient k-means evaluation with silhouette score in sklearn
                            
                                How to exit the script in a unittest test case
                            
                                Python theano with index computed inside the loop
                            
                                Calling Scrapy from another file without threading
                            
                                get playing wav audio level as output
                            
                                How to move the mouse in Selenium?
                            
                                How to use scikit's preprocessing/normalization along with cross validation?
                            
                                Why is Parsimonious rejecting my input with an IncompleteParseError?
                            
                                Python Requests - retry request after re-authentication
                            
                                Running scipy.integrate.ode in multiprocessing Pool results in huge performance hit
                            
                                How to break conversation data into pairs of (Context , Response)
                            
                                How to draw a proper grid on PyQt?
                            
                                Loading a pyspark ML model in a non-Spark environment
                            
                                Python doctest: skip a test conditionally
                            
                                Is there a Windows equivalent to PyVirtualDisplay

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do numpy functions operate on pandas objects internally?

Tags:

python

pandas

numpy

a-a

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us