numpy.ndarray vs pandas.DataFrame

Tags:

I need to make a strategic decision about choice of the basis for data structure holding statistical data frames in my program.

I store hundreds of thousands of records in one big table. Each field would be of a different type, including short strings. I'd perform multiple regression analysis and manipulations on the data that need to be done quick, in real time. I also need to use something, that is relatively popular and well supported.

I know about the following contestants:

list of `array.array`

That is the most basic thing to do. Unfortunately it doesn't support strings. And I need to use numpy anyway for its statistical part, so this one is out of question.

`numpy.ndarray`

The ndarray has ability to hold arrays of different types in each column (e.g. np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])). It seems a natural winner, but...

`pandas.DataFrame`

This one is built with statistical use in mind, but is it efficient enough?

I read, that the pandas.DataFrame is no longer based on the numpy.ndarray (although it shares the same interface). Can anyone shed some light on it? Or maybe there is an even better data structure out there?

230

asked Aug 08 '14 10:08

Adam Ryczkowski

1 Answers

pandas.DataFrame is awesome, and interacts very well with much of numpy. Much of the DataFrame is written in Cython and is quite optimized. I suspect the ease of use and the richness of the Pandas API will greatly outweigh any potential benefit you could obtain by rolling your own interfaces around numpy.

185

answered Sep 20 '22 13:09

daniel

Related questions
                            
                                Multiple reactors (main loops) in one application through threading (or alternative means)
                            
                                How to run Python nose tests with a different version of Python
                            
                                Django datefield filter by weekday/weekend
                            
                                Linux/Python: encoding a unicode string for print
                            
                                Setting an axis in matplotlib
                            
                                Why did Google choose Java for the Android Operating System? [closed]
                            
                                ORM with Graph-Databases like Neo4j in Python
                            
                                How do I make a trailing slash optional with webapp2?
                            
                                Can't start Windows service written in Python (win32serviceutil)
                            
                                axis limits for scatter plot not holding in matplotlib
                            
                                Best Machine Learning package for Python 3x? [closed]
                            
                                Flask-framework: MVC pattern
                            
                                Django py.test does not find settings module
                            
                                Any good way to programmatically change nginx config file from python?
                            
                                Using partial_fit with Scikit Pipeline
                            
                                PyQt4: Difference between QWidget and QMainWindow
                            
                                What's the difference between 'coding=utf8' and '-*- coding: utf-8 -*-'?
                            
                                Numba code slower than pure python
                            
                                Read Outlook Events via Python
                            
                                Multi-line logging in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy.ndarray vs pandas.DataFrame

Tags:

python

python-3.x

pandas

numpy

list of `array.array`

`numpy.ndarray`

`pandas.DataFrame`

Adam Ryczkowski

People also ask

1 Answers

daniel

Recent Activity

Donate For Us

numpy.ndarray vs pandas.DataFrame

Tags:

python

python-3.x

pandas

numpy

list of array.array

numpy.ndarray

pandas.DataFrame

Adam Ryczkowski

People also ask

1 Answers

daniel

Related questions

Recent Activity

Donate For Us

list of `array.array`

`numpy.ndarray`

`pandas.DataFrame`