Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set hadoop input format to NLineInputFormat?

I am trying to limit the number of lines each of the Mappers gets. My code goes like this:

    package com.iathao.mapreduce;

    import java.io.IOException;
    import java.net.MalformedURLException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.lib.NLineInputFormat;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.regexp.RESyntaxException;

    import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;

    public class Main {


    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException, RESyntaxException {

    try {
        if (args.length != 2) {
            System.err.println("Usage: NewMaxTemperature <input path> <output path>");
            System.exit(-1);
        }
        Job job = new Job();
        job.setJarByClass(Main.class);
        job.getConfiguration().set("mapred.max.map.failures.percent", "100");
        // job.getConfiguration().set("mapred.map.max.attempts", "10");
        //NLineInputFormat. .setNumLinesPerSplit(job, 1);
        job.setInputFormatClass(NLineInputFormat.class);

At the last line in the sample (job.setInputFormatClass(NLineInputFormat.class);) I get following error:

The method setInputFormatClass(Class<? extends InputFormat>) in the type Job is not applicable for the arguments (Class<NLineInputFormat>)

Did I somehow get the wrong NLineInputFormat class?

like image 665
Arsen Zahray Avatar asked Jan 18 '23 17:01

Arsen Zahray


1 Answers

You are mixing the old and the new API.

import org.apache.hadoop.mapred.lib.NLineInputFormat;
import org.apache.hadoop.mapreduce.Job;

According to the "Hadoop : The Definitive Guide"

The new API is in the org.apache.hadoop.mapreduce package (and subpackages). The old API can still be found in org.apache.hadoop.mapred.

If you plan to use the new API, then use the NLineInputFormat from the org.apache.hadoop.mapreduce package.

like image 197
Praveen Sripati Avatar answered Jan 30 '23 09:01

Praveen Sripati