In this code snippet train_dataset
, test_dataset
and valid_dataset
are of the type numpy.ndarray
.
def check_overlaps(images1, images2):
images1.flags.writeable=False
images2.flags.writeable=False
print(type(images1))
print(type(images2))
start = time.clock()
hash1 = set([hash(image1.data) for image1 in images1])
hash2 = set([hash(image2.data) for image2 in images2])
all_overlaps = set.intersection(hash1, hash2)
return all_overlaps, time.clock()-start
r, execTime = check_overlaps(train_dataset, test_dataset)
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(valid_dataset, test_dataset)
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)
But this gives the following error: (formatting as code to make it readable!)
ValueError Traceback (most recent call last)
<ipython-input-14-337e73a1cb14> in <module>()
12 return all_overlaps, time.clock()-start
13
---> 14 r, execTime = check_overlaps(train_dataset, test_dataset)
15 print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
16 r, execTime = check_overlaps(train_dataset, valid_dataset)
<ipython-input-14-337e73a1cb14> in check_overlaps(images1, images2)
7 print(type(images2))
8 start = time.clock()
----> 9 hash1 = set([hash(image1.data) for image1 in images1])
10 hash2 = set([hash(image2.data) for image2 in images2])
11 all_overlaps = set.intersection(hash1, hash2)
<ipython-input-14-337e73a1cb14> in <listcomp>(.0)
7 print(type(images2))
8 start = time.clock()
----> 9 hash1 = set([hash(image1.data) for image1 in images1])
10 hash2 = set([hash(image2.data) for image2 in images2])
11 all_overlaps = set.intersection(hash1, hash2)
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'
Now the problem is I don't even know what the error means let alone think about correcting it. Any help please?
The problem is that your method to hash arrays only works for python2
. Therefore, your code fails as soon as you try to compute hash(image1.data)
. The error message tells you that only memoryview
s of formats unsigned bytes ('B'
), bytes ('b'
) of single bytes ('c'
) are supported and I have not found a way to get such a view out of a np.ndarray
without copying. The only way I came up with includes copying the array, which might not be feasible in your application depending on your amount of data. That being said, you can try to change your function to:
def check_overlaps(images1, images2):
start = time.clock()
hash1 = set([hash(image1.tobytes()) for image1 in images1])
hash2 = set([hash(image2.tobytes()) for image2 in images2])
all_overlaps = set.intersection(hash1, hash2)
return all_overlaps, time.clock()-start
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With