Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping the first line of the .csv in Map reduce java

As mapper function runs for every line , can i know the way how to skip the first line. For some file it consists of column header which i want to ignore

like image 809
Kunal Avatar asked May 31 '16 08:05

Kunal


2 Answers

In mapper while reading the file, the data is read in as key-value pair. The key is the byte offset where the next line starts. For line 1 it is always zero. So in mapper function do the following

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException {
        try {
            if (key.get() == 0 && value.toString().contains("header") /*Some condition satisfying it is header*/)
                return;
            else {
                // For rest of data it goes here
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }     
like image 92
ViKiG Avatar answered Oct 13 '22 12:10

ViKiG


As the file can be stored in multiple nodes, we cant say in which machine the header part present and which mapper is processing that part of file. We can filter out the header in the Mapper itself.For this you have to know the headers. For example

 String[] cols= line.tokenize();
 if(cols[0].equals("header")) {
    // skip
 } else {
   // emit
}
like image 24
HITANSU Avatar answered Oct 13 '22 14:10

HITANSU