Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommended way to represent my data with numpy

Tags:

python

numpy

I am working on the processing of geologic data. Data is a 2D (x-y) map of vertical (z) columns of boxes, each box having more then 1 numerical parameter associated with it. I need the freedom to add/remove the box parameters as the code evolves (meaning I have no clue as of now how many I'd actually need). The number of boxes varies across the map. So the resulting 3D array is jagged in z direction. The algorithms applied to the data work on one vertical column of boxes at a time.

What would be a reasonable way to represent such a data structure using the numpy/scipy facilities? I've thought about a 3D structured array with a custom dtype. But it will potentially have lots of zeros because of the inherently jagged nature of the data.

like image 736
MindV0rtex Avatar asked Feb 23 '16 15:02

MindV0rtex


People also ask

What makes NumPy suitable for working with data?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

Can NumPy be used for data visualization?

we can cast a list to a numpy array by first importing it. Numpy arrays contain data of the same type, we can use attribute “dtype” to obtain the data type of the array's elements. Matplotlib: Matplotlib is one of the most widely used, if not the most popular data visualization library in Python.

Is NumPy faster on GPU?

As you can see for small operations, NumPy performs better and as the size increases, tf-numpy provides better performance. And the performance on GPU is way better than its CPU counterpart.


1 Answers

If your data is mutable during your code, numpy is not recommended.

A possible solution is to create a dictionary whose keys are the parameters. For example in a case with 2 boxes with coordinates [x1, y1] and [x2, y2], heights h1 and h2 and other general parameters you can define.

data = {
    'boxes': [[x1, y1], [x2, y2]],
    'height': [h1, h2],
    'general_parameter': [par1, par2]
}

in this way you can add parameters and boxes when you need:

data['new_parameter'] = [new_par1, new_par2]

if you want to use numpy, you can substitute lists with numpy array:

import numpy as np

data = {
    'boxes': np.array([[x1, y1], [x2, y2]]),
    'height': np.array([h1, h2]),
    'general_parameter': np.array([par1, par2])
}
like image 67
Francesco Nazzaro Avatar answered Oct 14 '22 01:10

Francesco Nazzaro