Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use hadoop for a web application?

Tags:

hadoop

I am working on a social networking web based application, which is uses Apache web server and MYSQL server for database with codeigniter MVC frameworks. I don't know how to integrate Hadoop in this application and how to write map- reduce program.

like image 213
Rashmi Avatar asked Aug 08 '11 22:08

Rashmi


People also ask

What applications use Hadoop?

Various Hadoop applications include stream processing, fraud detection, and prevention, content management, risk management. Financial sectors, healthcare sector, Government agencies, Retailers, Financial trading and Forecasting, etc. all are using Hadoop.

What is Hadoop and its application?

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.


2 Answers

Hadoop and map-reduce have no direct relationship to web applications. You should not integrate Hadoop into a web application as long as you understand web application as something that responds (quickly) to user input (web requests).

Hadoop and map-reduce are very useful for algorithms that run on large datasets in order to transform/extract data/knowledge from those datasets.

like image 94
mkro Avatar answered Sep 19 '22 15:09

mkro


While it is true that Hadoop is nowadays mostly used for "offline analytics", it can be useful to web projects as well. For example, to pre-compute recommendations or suggestions that are then provided to the users of a website.

Another case of use is to be able to ETL from multiple sources of data to produce an inverted index for a website (for example, jobs/cars/rentals-like websites with huge amounts of input data).

Always think of Hadoop when you have a "Big Data" problem, not if your website is managing small amounts of data.

Using Hadoop to tackle this sort of problems has some advantages and disadvantages. The obvious advantage is that it makes any sort of batch process (like the examples I mentioned) scale transparently. The disadvantage is that it isn't real-time: you can't use Hadoop to update your website every 5 seconds.

like image 39
Pere Ferrera Bertran Avatar answered Sep 20 '22 15:09

Pere Ferrera Bertran