Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get around this memoryview error in numpy?

In this code snippet train_dataset, test_dataset and valid_dataset are of the type numpy.ndarray.

def check_overlaps(images1, images2):
    images1.flags.writeable=False
    images2.flags.writeable=False
    print(type(images1))
    print(type(images2))
    start = time.clock()
    hash1 = set([hash(image1.data) for image1 in images1])
    hash2 = set([hash(image2.data) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

r, execTime = check_overlaps(train_dataset, test_dataset)    
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)   
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime) 
r, execTime = check_overlaps(valid_dataset, test_dataset) 
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)

But this gives the following error: (formatting as code to make it readable!)

ValueError                                Traceback (most recent call last)
<ipython-input-14-337e73a1cb14> in <module>()
     12     return all_overlaps, time.clock()-start
     13 
---> 14 r, execTime = check_overlaps(train_dataset, test_dataset)
     15 print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
     16 r, execTime = check_overlaps(train_dataset, valid_dataset)

<ipython-input-14-337e73a1cb14> in check_overlaps(images1, images2)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

<ipython-input-14-337e73a1cb14> in <listcomp>(.0)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'

Now the problem is I don't even know what the error means let alone think about correcting it. Any help please?

like image 831
user6692576 Avatar asked Aug 08 '16 18:08

user6692576


1 Answers

The problem is that your method to hash arrays only works for python2. Therefore, your code fails as soon as you try to compute hash(image1.data). The error message tells you that only memoryviews of formats unsigned bytes ('B'), bytes ('b') of single bytes ('c') are supported and I have not found a way to get such a view out of a np.ndarray without copying. The only way I came up with includes copying the array, which might not be feasible in your application depending on your amount of data. That being said, you can try to change your function to:

def check_overlaps(images1, images2):
    start = time.clock()
    hash1 = set([hash(image1.tobytes()) for image1 in images1])
    hash2 = set([hash(image2.tobytes()) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start
like image 175
jotasi Avatar answered Oct 19 '22 21:10

jotasi