I have been using numpy/scipy for data analysis. I recently started to learn Pandas. I have gone through a few tutorials and I am trying to understand what are the major improvement of Pandas over Numpy/Scipy. It seems to me that the key idea of Pandas is to wrap up different numpy arrays in a Data Frame, with some utility functions around it. Is there something revolutionary about Pandas that I just stupidly missed?

Pandas is not particularly revolutionary and does use the NumPy and SciPy ecosystem to accomplish it's goals along with some key Cython code. It can be seen as a simpler API to the functionality with the addition of key utilities like joins and simpler group-by capability that are particularly useful for people with Table-like data or time-series. But, while not revolutionary, Pandas does have key benefits. For a while I had also perceived Pandas as just utilities on top of NumPy for those who liked the DataFrame interface. However, I now see Pandas as providing these key features (this is not comprehensive): <ol> <li>Array of Structures (independent-storage of disparate types instead of the contiguous storage of structured arrays in NumPy) --- this will allow faster processing in many cases.</li> <li>Simpler interfaces to common operations (file-loading, plotting, selection, and joining / aligning data) make it easy to do a lot of work in little code.</li> <li>Index arrays which mean that operations are always aligned instead of having to keep track of alignment yourself. </li> <li>Split-Apply-Combine is a powerful way of thinking about and implementing data-processing</li> </ol> However, there are downsides to Pandas: <ol> <li>Pandas is basically a user-interface library and not particularly suited for writing library code. The "automatic" features can lull you into repeatedly using them even when you don't need to and slowing down code that gets called over and over again. </li> <li>Pandas typically takes up more memory as it is generous with the creation of object arrays to solve otherwise sticky problems of things like string handling. </li> <li>If your use-case is outside the realm of what Pandas was designed to do, it gets clunky quickly. But, within the realms of what it was designed to do, Pandas is powerful and easy to use for quick data analysis. </li> </ol>

Python - What are the major improvement of Pandas over Numpy/Scipy

1 Answers

Pandas is not particularly revolutionary and does use the NumPy and SciPy ecosystem to accomplish it's goals along with some key Cython code. It can be seen as a simpler API to the functionality with the addition of key utilities like joins and simpler group-by capability that are particularly useful for people with Table-like data or time-series. But, while not revolutionary, Pandas does have key benefits.

For a while I had also perceived Pandas as just utilities on top of NumPy for those who liked the DataFrame interface. However, I now see Pandas as providing these key features (this is not comprehensive):

Array of Structures (independent-storage of disparate types instead of the contiguous storage of structured arrays in NumPy) --- this will allow faster processing in many cases.
Simpler interfaces to common operations (file-loading, plotting, selection, and joining / aligning data) make it easy to do a lot of work in little code.
Index arrays which mean that operations are always aligned instead of having to keep track of alignment yourself.
Split-Apply-Combine is a powerful way of thinking about and implementing data-processing

However, there are downsides to Pandas:

Pandas is basically a user-interface library and not particularly suited for writing library code. The "automatic" features can lull you into repeatedly using them even when you don't need to and slowing down code that gets called over and over again.
Pandas typically takes up more memory as it is generous with the creation of object arrays to solve otherwise sticky problems of things like string handling.
If your use-case is outside the realm of what Pandas was designed to do, it gets clunky quickly. But, within the realms of what it was designed to do, Pandas is powerful and easy to use for quick data analysis.

184

answered Nov 15 '22 20:11

Travis Oliphant

Related questions
                            
                                Rolling mean with customized window with Pandas
                            
                                How do I separate slides when exporting an IPython notebook to reveal.js?
                            
                                Execute Python (selenium) script in crontab
                            
                                How do you add input from user into list in Python [closed]
                            
                                How to test if object is a mapping (supports **O usage)
                            
                                matplotlib animation duration
                            
                                Python - While false loop
                            
                                PyQt: How to get most of QListWidget
                            
                                Benefits of accessing the Abstract Syntaxt Tree (AST) . How does Julia exploit it?
                            
                                Error installing scipy library through pip on python 3: "compile failed with error code 1"
                            
                                slug field on flask
                            
                                Finding the difference between consecutive numbers in a list (Python)
                            
                                How to add a variable into my re.compile expression
                            
                                Celery 'Getting Started' not able to retrieve results; always pending
                            
                                Python Title Case, but leave pre-existing uppercase
                            
                                Importing requests module does not work
                            
                                beautifulsoup: ImportError: No module named html.entities
                            
                                How to convert bytearray with non-ASCII bytes to string in python?
                            
                                max() give "int" not callable error in my function
                            
                                Read file as a list of tuples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - What are the major improvement of Pandas over Numpy/Scipy

Tags:

python

pandas

numpy

scipy

data-analysis

CuriousMind

People also ask

1 Answers

Travis Oliphant

Recent Activity

Donate For Us