Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing files with headers in Hadoop

Tags:

hadoop

I want to process a lot of files in Hadoop -- each file has some header information, followed by a lot of records, each stored in a fixed number of bytes. Any suggestions on that?


2 Answers

I think the best solution is to write a custom InputFormat.

like image 104
Paolo Capriotti Avatar answered Jan 25 '26 20:01

Paolo Capriotti


There is one solution , you can check the offset of line of files that mapper reads . It will be zero for the first line in the file . so you can add line in Map as follows:

public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException {

        if(key.get() > 0)
                       {
                         your mapper code
                       }
              }

So, it will skip the first line of the file.

However, its not a good way because in this way this condition will be checked for each line in the file.

Best way is to use your Custom Input Format

like image 21
Sourav Gulati Avatar answered Jan 25 '26 21:01

Sourav Gulati



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!