i just started to learn Hadoop and have gone through some sites and i often found that
"Hadoop is not a real-time platform" even in SO also
I mess with this and i really cant understand about it . Can any one help me and explain me about this?
Thanks all
Hadoop was initially designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. But to be honest, this was only the case at Hadoop's beginning, and now you have plenty of opportunities to use Hadoop in a more real-time way.
First I think it's important to define what you mean by real-time. It could be that you're interested in stream processing, or could also be that you want to run queries on your data that return results in real-time.
For stream processing on Hadoop, natively Hadoop won't provide you with this kind of capabilities, but you can integrate some other projects with Hadoop easily:
For real-time queries there are also several projects which use Hadoop:
There are probably other projects that would fit into the list of "Making Hadoop real-time", but these are the most well-known ones.
So as you can see, Hadoop is going more and more towards the direction of real-time and, even if it wasn't designed for that, you have plenty of opportunities to extend it for real-time purposes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With