Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a performance difference between Numpy and Pandas?

I've written a bunch of code on the assumption that I was going to use Numpy arrays. Turns out the data I am getting is loaded through Pandas. I remember now that I loaded it in Pandas because I was having some problems loading it in Numpy. I believe the data was just too large.

Therefore I was wondering, is there a difference in computational ability when using Numpy vs Pandas?

If Pandas is more efficient then I would rather rewrite all my code for Pandas but if there is no more efficiency then I'll just use a numpy array...

like image 232
Terence Chow Avatar asked Feb 05 '14 03:02

Terence Chow


People also ask

Is Numpy faster than Pandas?

The indexing of NumPy arrays is faster than that of the Pandas Series.

Is Pandas better than Numpy?

Numpy is memory efficient. Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays.

What is the advantage of Pandas over Numpy?

It provides high-performance, easy to use structures and data analysis tools. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2d table object called Dataframe. It is like a spreadsheet with column names and row labels.

Does Numpy use less memory than Pandas?

numpy consumes less memory compared to pandas. numpy generally performs better than pandas for 50K rows or less.


1 Answers

There can be a significant performance difference, of an order of magnitude for multiplications and multiple orders of magnitude for indexing a few random values.

I was actually wondering about the same thing and came across this interesting comparison: http://penandpants.com/2014/09/05/performance-of-pandas-series-vs-numpy-arrays/

like image 118
Mark Avatar answered Sep 20 '22 17:09

Mark