So I am using OpenCV on raspbian (raspberry pi 2 model B). I am doing vision/image processing obviously and the rasppi is what I was given (I would use a computer if I could for this).
I need to run a skeleton function. I found the following implementation:
import cv2
import numpy as np
img = cv2.imread('img.png',0)
size = np.size(img)
skeleton = np.zeros(img.shape,np.uint8)
ret,img = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
finished = False
while(not finished):
eroded = cv2.erode(img,kernel)
temp = cv2.dilate(eroded,kernel)
temp = cv2.subtract(img,temp)
skeleton = cv2.bitwise_or(skeleton,temp)
img = eroded.copy()
zeros = size - cv2.countNonZero(img)
if zeros==size:
finished = True
cv2.imshow("skeleton",skeleton)
cv2.waitKey(0)
cv2.destroyAllWindows()
While it runs, it's very very slow unsurprisingly (I am doing an FFT and bandpass filtering operation the image before this, then running the skeleton operation). The other code is slow, but will complete the operations.
The images are big - I could crop them some, but I don't think it would be enough. I was trying to find an optimized version of this, but so far haven't come up with anything. Any ideas or solutions?
In this answer, I'll focus on improving your implementation, rather than the algorithm. While this won't gain us a significant amount, I think it's still useful to be aware of.
Let's begin with some boilerplate -- necessary imports, some test image, and few functions to let us compare easily:
from timeit import default_timer as timer
import numpy as np
import cv2
# Create a decent size test image...
img = cv2.imread('cage.png',0)
img = cv2.resize(img, (2048, 2048))
cv2.normalize(img, img, 0, 255, cv2.NORM_MINMAX)
def time_fn(fn, img, iters=1):
start = timer()
result = None
for i in range(iters):
result = fn(img)
end = timer()
return (result,((end - start) / iters) * 1000)
def run_test(fn, img, i):
res, t = time_fn(fn, img, 4)
cv2.imwrite("skeleton_%d.png" % i, res[0])
print "Variant %d" % i
print "Input size = (%d, %d)" % img.shape[:2]
print "Ran %d iterations to find skeleton." % res[1]
print "Avg. find_skeleton time = %0.4f s." % (t/1000)
Let's turn your implementation into a function, and remove a few unnecessary bits. Out of curiosity, let's track the number of iterations needed for the skeletonization.
def find_skeleton1(img):
skeleton = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
eroded = cv2.erode(thresh, kernel)
temp = cv2.dilate(eroded, kernel)
temp = cv2.subtract(thresh, temp)
skeleton = cv2.bitwise_or(skeleton, temp)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
And let's see how it performs to set our baseline.
>>> run_test(find_skeleton1, img, 1)
Variant 1
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 2.7969 s.
The first improvement we can make is to minimize the number of allocations of new array objects, and reuse as much as possible. We can create a few more temporary arrays (like skeleton
), and use the dst
parameter of the OpenCV functions in the loop ignoring the return value. Since we provide a destination of correct shape and data type, the existing array gets reused.
def find_skeleton2(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Let's try this out, and check that the results are the same:
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton2(img)[0])
True
>>> run_test(find_skeleton2, img, 2)
Variant 2
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 1.4356 s.
The next step is to get rid of unnecessary copies -- there's one that's very obvious: thresh = eroded.copy()
. Notice that in the following iteration, we immediately overwrite the contents of eroded
. Hence, we don't really care what it contains, as long as it's the correct shape and data type. They are, so this means that rather than performing a copy, we can just swap the two objects.
def find_skeleton3(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh, eroded = eroded, thresh # Swap instead of copy
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Again, let's verify the results match and do some timing.
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton3(img)[0])
True
>>> run_test(find_skeleton3, img, 3)
Variant 3
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 0.9839 s.
Few simple changes got the timing down to ~35% of the original. Of course, it still does hundreds of iterations processing the entire image. Next step would be to look into ways how to reduce the amount of work -- in the latter iterations, significant areas of the working image are black, and don't contribute anything to the skeleton.
NB: Measurements done on i7-4930K. I don't have a raspberry, feel free to add timings from yours, so we see what sort of effect it has.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With