I need to implement a custom (service) input source for a Hadoop MapReduce app. I google'd and SO'd and found that one way to proceed is to implement a custom InputFormat. Is that correct?
Apparently according to http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/InputFormat.html InputFormat's methods getRecordReader() and getSplits() are deprecated. What's the replacement?
Hadoop's WordCount example still uses the same...
Although Hadoop still uses things from the mapred package internally, from the user's perspective, they should pretty much all be considered deprecated. Hadoop is extremely lacking when it comes to documentation and their examples all tend to be outdated. Luckily, when you're really stuck there's always stackoverflow
What happened is, in 0.20 they deprecated mapred classes and introduced a new API. However, new API lacked few core features, and thus old API was 'undeprecated' in the latest release. It is advisable to use old API as most likely it will be the one that is here to stay.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With