I have a large file contain > 10 million line. I want to get dupplicate line using MapReduce. How can I solve this problem? Thanks for help
You need to make use of the fact that the default behaviour of MapReduce is to group values based on a common key.
So the basic steps required are:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With