As mapper function runs for every line , can i know the way how to skip the first line. For some file it consists of column header which i want to ignore
In mapper while reading the file, the data is read in as key-value pair. The key is the byte offset where the next line starts. For line 1 it is always zero. So in mapper function do the following
@Override
public void map(LongWritable key, Text value, Context context) throws IOException {
try {
if (key.get() == 0 && value.toString().contains("header") /*Some condition satisfying it is header*/)
return;
else {
// For rest of data it goes here
}
} catch (Exception e) {
e.printStackTrace();
}
}
As the file can be stored in multiple nodes, we cant say in which machine the header part present and which mapper is processing that part of file. We can filter out the header in the Mapper itself.For this you have to know the headers. For example
String[] cols= line.tokenize();
if(cols[0].equals("header")) {
// skip
} else {
// emit
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With