Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert .txt file to Hadoop's sequence file format

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

like image 260
Abhishek Pathak Avatar asked Mar 21 '11 11:03

Abhishek Pathak


People also ask

What is sequence file example?

The concept of SequenceFile is to put each small file to a larger single file. For example, suppose there are 10,000 100KB files, then we can write a program to put them into a single SequenceFile like below, where you can use filename to be the key and content to be the value.

What is the sequence of a file?

In computing, as well as in non-computing contexts, a file sequence is a well-ordered, (finite) collection of files, usually related to each other in some way.


1 Answers

So the way more simplest answer is just an "identity" job that has a SequenceFile output.

Looks like this in java:

    public static void main(String[] args) throws IOException,         InterruptedException, ClassNotFoundException {      Configuration conf = new Configuration();     Job job = new Job(conf);     job.setJobName("Convert Text");     job.setJarByClass(Mapper.class);      job.setMapperClass(Mapper.class);     job.setReducerClass(Reducer.class);      // increase if you need sorting or a special number of files     job.setNumReduceTasks(0);      job.setOutputKeyClass(LongWritable.class);     job.setOutputValueClass(Text.class);      job.setOutputFormatClass(SequenceFileOutputFormat.class);     job.setInputFormatClass(TextInputFormat.class);      TextInputFormat.addInputPath(job, new Path("/lol"));     SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));      // submit and wait for completion     job.waitForCompletion(true);    } 
like image 149
Thomas Jungblut Avatar answered Sep 16 '22 15:09

Thomas Jungblut