Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If I use python pandas, is there any need for structured arrays?

Tags:

Now that pandas provides a data frame structure, is there any need for structured/record arrays in numpy? There are some modifications I need to make to an existing code which requires this structured array type framework, but I am considering using pandas in its place from this point forward. Will I at any point find that I need some functionality of structured/record arrays that pandas does not provide?

like image 389
hatmatrix Avatar asked Aug 21 '12 09:08

hatmatrix


People also ask

Should I use pandas or NumPy?

Numpy is memory efficient. Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays.

What is the difference between a NumPy array and a pandas DataFrame?

Simply speaking, use Numpy array when there are complex mathematical operations to be performed. Use Pandas dataframe for ease of usage of data preprocessing including performing group operations, creation of Matplotlib plots, rows and columns operations.

What is an advantage of DataFrames vs NumPy arrays?

Comparison between DataFrame and Array Numpy arrays can be multi-dimensional whereas DataFrame can only be two-dimensional. Arrays contain similar types of objects or elements whereas DataFrame can have objects or multiple or similar data types. Both array and DataFrames are mutable.

Is array a pandas data structure?

Pandas is a one-dimensional labeled array and capable of holding data of any type (integer, string, float, python objects, etc.)


1 Answers

pandas's DataFrame is a high level tool while structured arrays are a very low-level tool, enabling you to interpret a binary blob of data as a table-like structure. One thing that is hard to do in pandas is nested data types with the same semantics as structured arrays, though this can be imitated with hierarchical indexing (structured arrays can't do most things you can do with hierarchical indexing).

Structured arrays are also amenable to working with massive tabular data sets loaded via memory maps (np.memmap). This is a limitation that will be addressed in pandas eventually, though.

like image 60
Wes McKinney Avatar answered Sep 20 '22 15:09

Wes McKinney