I have been using numpy/scipy for data analysis. I recently started to learn Pandas.
I have gone through a few tutorials and I am trying to understand what are the major improvement of Pandas over Numpy/Scipy.
It seems to me that the key idea of Pandas is to wrap up different numpy arrays in a Data Frame, with some utility functions around it.
Is there something revolutionary about Pandas that I just stupidly missed?
The Pandas provides some sets of powerful tools like DataFrame and Series that mainly used for analyzing the data, whereas in NumPy module offers a powerful object called Array. Instacart, SendGrid, and Sighten are some of the famous companies that work on the Pandas module, whereas NumPy is used by SweepSouth.
Pandas is mostly used for data analysis tasks in Python. NumPy is mostly used for working with Numerical values as it makes it easy to apply mathematical functions. Pandas library works well for numeric, alphabets, and heterogeneous types of data simultaneously.
The essential difference is the presence of the index: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.
Pandas is not particularly revolutionary and does use the NumPy and SciPy ecosystem to accomplish it's goals along with some key Cython code. It can be seen as a simpler API to the functionality with the addition of key utilities like joins and simpler group-by capability that are particularly useful for people with Table-like data or time-series. But, while not revolutionary, Pandas does have key benefits.
For a while I had also perceived Pandas as just utilities on top of NumPy for those who liked the DataFrame interface. However, I now see Pandas as providing these key features (this is not comprehensive):
However, there are downsides to Pandas:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With