Numpy functions, eg np.mean(), np.var(), etc, accept an array-like argument, like np.array, or list, etc.
But passing in a pandas dataframe also works. This means that a pandas dataframe can indeed disguise itself as a numpy array, which I find a little strange (despite knowing the fact that the underlying values of a df are indeed numpy arrays).
For an object to be an array-like, I thought that it should be slicable using integer indexing in the way a numpy array is sliced. So for instance df[1:3, 2:3] should work, but it would lead to an error.
So, possibly a dataframe gets converted into a numpy array when it goes inside the function. But if that is the case then why does np.mean(numpy_array) lead to a different result than that of np.mean(df)?
a = np.random.rand(4,2)
a
Out[13]:
array([[ 0.86688862, 0.09682919],
[ 0.49629578, 0.78263523],
[ 0.83552411, 0.71907931],
[ 0.95039642, 0.71795655]])
np.mean(a)
Out[14]: 0.68320065182041034
gives a different result than what the below gives...
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]),
columns=range(np.shape(a)[1]))
df
Out[18]:
0 1
0 0.866889 0.096829
1 0.496296 0.782635
2 0.835524 0.719079
3 0.950396 0.717957
np.mean(df)
Out[21]:
0 0.787276
1 0.579125
dtype: float64
The former output is a single number, whereas the latter is a column-wise mean. How does a numpy function know about the make of a dataframe?
The answer is NO, numpy and pandas are not strictly bound. Sometimes you need the help of numpy to do some special works, like computations, that's why you may need to import and use. But to work with pandas, numpy is not mandatory. Actually numpy is a pandas dependency and pandas called it internally.
Pandas is defined as an open-source library that provides high-performance data manipulation in Python. It is built on top of the NumPy package, which means Numpy is required for operating the Pandas. The name of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional data.
A DataFrame object relies on underlying data structures to improve performance of row-oriented and column-oriented operations. One of these data structures includes the BlockManager. The BlockManager is a core architectural component that is an internal storage object in Pandas.
NumPy is a library for Python that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a high-level data manipulation tool that is built on the NumPy package.
If you step through this:
--Call--
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2796)mean()
-> def mean(a, axis=None, dtype=None, out=None, keepdims=False):
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2877)mean()
-> if type(a) is not mu.ndarray:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2878)mean()
-> try:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2879)mean()
-> mean = a.mean
You can see that the type
is not a ndarray
so it tries to call a.mean
which in this case would be df.mean()
:
In [6]:
df.mean()
Out[6]:
0 0.572999
1 0.468268
dtype: float64
This is why the output is different
Code to reproduce above:
In [3]:
a = np.random.rand(4,2)
a
Out[3]:
array([[ 0.96750329, 0.67623187],
[ 0.44025179, 0.97312747],
[ 0.07330062, 0.18341157],
[ 0.81094166, 0.04030253]])
In [4]:
np.mean(a)
Out[4]:
0.52063384885403818
In [5]:
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]),
columns=range(np.shape(a)[1]))
df
Out[5]:
0 1
0 0.967503 0.676232
1 0.440252 0.973127
2 0.073301 0.183412
3 0.810942 0.040303
numpy output:
In [7]:
np.mean(df)
Out[7]:
0 0.572999
1 0.468268
dtype: float64
If you'd called .values
to return a np
array then the output is the same:
In [8]:
np.mean(df.values)
Out[8]:
0.52063384885403818
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With