Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop mapReduce How to store only values in HDFS

I am using This for removing Duplicate lines

public class DLines
 {
   public static class TokenCounterMapper extends Mapper<Object, Text, Text, IntWritable>
    {
    private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
      @Override
      public void map(Object key, Text value, Context context) throws IOException, InterruptedException
       {
           String line=value.toString();
           //int hash_code=line.hashCode();
           context.write(value, one);
       }
   }

public static class TokenCounterReducer extends Reducer<Text, IntWritable, Text, IntWritable> 
 {
        @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException 
     {
 public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException 
     {
       int sum = 0;
       for (IntWritable value : values) 
       {   
           sum += value.get();        
       }
       if (sum<2)
           {
             context.write(key,new IntWritable(sum));
           }
      }
      }

i have to store only Key in hdfs.

like image 304
user3610736 Avatar asked Oct 01 '22 19:10

user3610736


People also ask

Can we use HDFS with MapReduce?

MapReduce programming offers several benefits to help you gain valuable insights from your big data: Scalability. Businesses can process petabytes of data stored in the Hadoop Distributed File System (HDFS).

Is HDFS a key-value store?

Storing data in NoSQL databases can provide a key-value storage model. However, HDFS is a dostributed file storage in Hadoop ecosystem. Key-value is used by mapreduce clusters. Therefore, this distribution is generated in processing phase only.

Where does the output of MapReduce get stored?

All inputs and outputs are stored in the HDFS. While the map is a mandatory step to filter and sort the initial data, the reduce function is optional.


1 Answers

If you do not require value from your reducer, just use NullWritable.

You could simply say context.write(key,NullWritable.get());

In you driver, you could also set

 job.setMapOutputKeyClass(Text.class);
 job.setMapOutputValueClass(IntWritable.class);

&

 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(NullWritable.class);
like image 95
Arun A K Avatar answered Oct 27 '22 01:10

Arun A K