Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop and different Format of inputs like Image, Audio, Video

I am learning Hadoop and MapReduce framework. Until now i have played around text files and processed them by leveraging MapReduce framework.

When i started MapReduce learning first popular example i found was WORDCOUNT which is a text file processing scenario. Then i wrote my own logic to process some text files and displayed results. I was successful in that case.

But i need to move on to different format of inputs. Because in the real world we are not going to process only the text files. I need to explore processing on different formats like images,audio,video using MapReduce framework. But i am struggling to find apt examples which would serve my purpose. I need some examples and tutorials on MapReduce with different format of inputs ranging form a text to video.

Edit:

I mean handling Images, Videos and Audios. Not only the text file.

Edit 2:

An Example: Say i am having a 10 years of .bmp images(where compression and decompression is not involved) whose size is 450GB. I need to analyse every image in the folder and i should display images which is similar (By comparing the similarity pattern of pixels). And i should list the images that were created/modified between "From" "To" date. Say images created/modified between Jan of 2013 to feb of 2013 in that set of images. How can i accomplish this??

I would be happy if any one help me to travel in the right path!!

like image 641
BinaryMee Avatar asked Mar 18 '13 06:03

BinaryMee


2 Answers

HIPI is a framework for image processing of the image file with MapReduce.

Here is a paper on high performance video processing in the cloud. It's not exactly MapReduce, but very similar to MapReduce.

Note that I haven't tried them, but did a bit of Googling and these are the closest resources I could get.

like image 60
Praveen Sripati Avatar answered Oct 23 '22 10:10

Praveen Sripati


When you set up your mapper and reducer, you can specify the input/output key and value datatypes. This is where you would handle the different datatype in the way i think you want to do.

here is an example (albeit poorly formatted) that uses the int datatype to calculate a mean:

http://souravgulati.webs.com/apps/forums/topics/show/8539120-hadoop-map-reduce-example-calculate-mean-in-map-reduce

edit

When dealing with those types of files, it again helps to have an example of what specifically you are trying to accomplish. e.g. if you are using audio: are you using .wav files? That would be good to know as you can do your processing using the byte datatype. otherwise if you're using .mp3 files you have compression to deal with.

Same with images, .bmp files i believe are not compressed and would be straightforward to manipulate in map reduce using the int or byte data types. files that use any type of compression would most likely require some sort of pre-processing before you run your job.

most tutorials out there deal with word count or something simple like that. it'd be better to have a specific problem to solve in order to get better advice.

soooo what are you trying to do with your mapreduce job? count the number of pixels in an image? emboss an image? calculate the mean volume of an audio file?

edit

What you've described are 2 different mapred tasks (unless you only want to perform analysis on all images between your to-from dates).

What you can try to do (and this is a high level description without any code) is the following (and this is off the top of my head as i've not used mapreduce in this way):

because your mapred job requires comparing two image files at a time, you need to run number-of-files facotrial map reduce jobs to get all the possible file comparisons. this could take a while!

you need your mapper to input two files at a time and perform your comparison mapreduce job. you run this job as many times as required to process all combos of your source image files. you can coordinate these jobs with something like [oozie][1]

now you might ask - how to compare two image files in mapreduce? again, i've not done it but this may point you in the right direction - look into mapreduce jobs with multiple file sources: Hadoop mapper reading from 2 different source input files

like image 32
Tucker Avatar answered Oct 23 '22 09:10

Tucker