How to convert .txt file to Hadoop's sequence file format

Tags:

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

260

asked Mar 21 '11 11:03

Abhishek Pathak

1 Answers

So the way more simplest answer is just an "identity" job that has a SequenceFile output.

Looks like this in java:

    public static void main(String[] args) throws IOException,         InterruptedException, ClassNotFoundException {      Configuration conf = new Configuration();     Job job = new Job(conf);     job.setJobName("Convert Text");     job.setJarByClass(Mapper.class);      job.setMapperClass(Mapper.class);     job.setReducerClass(Reducer.class);      // increase if you need sorting or a special number of files     job.setNumReduceTasks(0);      job.setOutputKeyClass(LongWritable.class);     job.setOutputValueClass(Text.class);      job.setOutputFormatClass(SequenceFileOutputFormat.class);     job.setInputFormatClass(TextInputFormat.class);      TextInputFormat.addInputPath(job, new Path("/lol"));     SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));      // submit and wait for completion     job.waitForCompletion(true);    }

149

answered Sep 16 '22 15:09

Thomas Jungblut

Related questions
                            
                                How can I sort an ArrayList of Strings in Java?
                            
                                How to declare several stylable attributes with the same name for different tags?
                            
                                Importing multiple projects with the same name in Eclipse
                            
                                Make Logback include the "T" between date and time in its "%date" format for strict ISO 8601 compliance
                            
                                How to create my own filter with Spring MVC?
                            
                                WARNING: unable to change permissions for everybody:
                            
                                Jackson JSON deserialization with multiple parameters constructor
                            
                                How can I ask Maven for a list of the default repositories?
                            
                                Initializing Log4J with Spring?
                            
                                Is it possible to find logback log files programmatically?
                            
                                Log4j2 - configuring
                            
                                How Exactly Does @param Work - Java
                            
                                In Java, should I escape a single quotation mark (') in String (double quoted)?
                            
                                How to define a custom AuthenticationEntryPoint without XML configuration
                            
                                Android Espresso: PerformException
                            
                                Try with multiple Resource in Java [duplicate]
                            
                                What code does the compiler generate for autoboxing?
                            
                                Treeset to order elements in descending order
                            
                                Can I generate an HPROF file at will?
                            
                                Large Objects may not be used in auto-commit mode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert .txt file to Hadoop's sequence file format

Tags:

java

file

type-conversion

hadoop

hive

Abhishek Pathak

People also ask

1 Answers

Thomas Jungblut

Recent Activity

Donate For Us