I'm new to hadoop and I'm going to develop an application which process multiple images using hadoop and show users the results live, while they computation is in progress. The basic approach is distribute executable and bunch of images and gather the results.
Can I get results interactively while the computing process is in progress?
Are there any other alternatives than hadoop streaming, for such use case?
How can I feed executable with images? I can't find any examples other than feeding it with stdin.
1. HIPI is an image processing library designed to be used with the Apache Hadoop MapReduce parallel programming framework. HIPI facilitates efficient and high throughput image processing with MapReduce style parallel programs typically executed on a cluster (University of Virginia Computer Graphics Lab, 2016).
It is a programming technique based on Java that is used on top of the Hadoop framework for faster processing of huge quantities of data. It processes this huge data in a distributed environment using many Data Nodes which enables parallel processing and faster execution of operations in a fault-tolerant way.
So to store the images or frames into the HDFS, first convert the frames as the stream of bytes and then store in HDFS. Hadoop provides us the facility to read/write binary files. So, practically anything which can be converted into bytes can be stored in HDFS.
Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
For processing images on Hadoop the best way to organize the computations would be:
I don't know the complete case of yours, but one possible solution is to use Kafka + Spark Streaming. Your application should put the images in a binary format to the Kafka queue while Spark will consume and process them in micro batches on the cluster, updating the users through some third component (at least by putting the image processing status into the Kafka for another application to process it)
But in general, information you provided is not complete to recommend a good architecture for your specific case
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With