I want to do some image analysis on a large amount of images (thousands) and I want to try to use Spark to speed this up. For testing purposes I am using docker compose to setup a standalone cluster locally.
I want to do some basic analysis such as computing gradients, edge detection, etc. I can successfully load my images into a dataframe using:
images = spark.read.format("image").option("dropInvalid", True).load("/opt/spark-data/")
I tried to call OpenCV functions such as Sobel, using udf. But I am unable to load the image data into a format that OpenCV can work with.
Is there any way I can convert the image data in a way such that I can use OpenCV functions? Or are there better ways to do this than using OpenCV?
I was able to make this work from help from this post.
def convertImageGeneric( image, fa , down_width = 500, down_height = 500):
import numpy as np
import cv2
fa = cv2.SIFT_create(400)
cv2_image = cv2.cvtColor(
np.reshape(image.data, (image.height, image.width, image.nChannels)), # this handles the image conversion
cv2.COLOR_BGR2GRAY
)
preds = fa.detect( image , None )
return (image.origin, Vectors.dense(no_more_numpy(preds)) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With