Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

optimal data structure to store million of pixels in python?

I have several images and after some basic processing and contour detection I want to store the detected pixels locations and their adjacent neighbours values into a Python Data Structure. I settled for numpy.array

The pixel locations from each Image are retrieved using:

locationsPx = cv2.findNonZero(SomeBWImage)

which will return an array of the shape (NumberOfPixels,1L,2L) with :

print(locationsPx[0]) : array([[1649,    4]])

for example.

My question is: is it possible to store this double array on a single column in another array? Or should I use a list and drop the array all together?

note: the dataset of images might increase so the dimensions of my chose data structure will not be only huge, but also variable

EDIT: or maybe numpy.array is not good idea and Pandas Dataframe is better suited? I am open to suggestion from those who have more experience in this.

like image 611
RMS Avatar asked Oct 11 '16 13:10

RMS


1 Answers

Numpy arrays are great for computation. They are not great for storing data if the size of the data keeps changing. As ali_m pointed out, all forms of array concatenation in numpy are inherently slow. Better to store the arrays in a plain-old python list:

coordlist = []
coordlist.append(locationsPx[0])

Alternatively, if your images have names, it might be attractive to use a dict with the image names as keys:

coorddict = {}
coorddict[image_name] = locationsPx[0]

Either way, you can readily iterate over the contents of the list:

for coords in coordlist:

or

for image_name, coords in coorddict.items():

And pickle is a convenient way to store your results in a file:

import pickle
with open("filename.pkl", "wb") as f:
    pickle.dump(coordlist, f, pickle.HIGHEST_PROTOCOL)

(or same with coorddict instead of coordlist). Reloading is trivially easy as well:

with open("filename.pkl", "rb") as f:
    coordlist = pickle.load(f)

There are some security concerns with pickle, but if you only load files you have created yourself, those don't apply.

If you find yourself frequently adding to a previously pickled file, you might be better off with an alternative back end, such as sqlite.

like image 199
Daniel Wagenaar Avatar answered Oct 12 '22 15:10

Daniel Wagenaar