I would like to store some videos/images into Hadoop HDFS, but I heard that HDFS accepts only files like as a text.
To be sure, can we store videos/images into HDFS? If yes, what's the way or the steps to follow to do that?
It is absolutely possible without doing anything extra. Hadoop provides us the facility to read/write binary files. So, practically anything which can be converted into bytes can be stored into HDFS(images, videos etc). To do that Hadoop provides something called as SequenceFiles. SequenceFile is a flat file consisting of binary key/value pairs. The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively. So, you could convert your image/video file into a SeuenceFile and store it into the HDFS. Here is small piece of code that will take an image file and convert it into a SequenceFile, where name of the file is the key and image content is the value :
public class ImageToSeq {
public static void main(String args[]) throws Exception {
Configuration confHadoop = new Configuration();
confHadoop.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
confHadoop.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(confHadoop);
Path inPath = new Path("/mapin/1.png");
Path outPath = new Path("/mapin/11.png");
FSDataInputStream in = null;
Text key = new Text();
BytesWritable value = new BytesWritable();
SequenceFile.Writer writer = null;
try{
in = fs.open(inPath);
byte buffer[] = new byte[in.available()];
in.read(buffer);
writer = SequenceFile.createWriter(fs, confHadoop, outPath, key.getClass(),value.getClass());
writer.append(new Text(inPath.getName()), new BytesWritable(buffer));
}catch (Exception e) {
System.out.println("Exception MESSAGES = "+e.getMessage());
}
finally {
IOUtils.closeStream(writer);
System.out.println("last line of the code....!!!!!!!!!!");
}
}
}
And if your intention is to just dump the files as it is, you could simply do this :
bin/hadoop fs -put /src_image_file /dst_image_file
And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment.
HTH
It is entirely possible to store images and video on HDFS, but you will likely need to use/write your own custom InputFormat
, OutputFormat
and RecordReader
in order to split them properly.
I imagine others have undertaken similar projects however, so if you scour the net you might be able to find that someone has already written custom classes to do exactly what you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With