Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it an overkill to use Hadoop?

Tags:

hadoop

I have an Oracle database (roughly 1.2 billion records) of data with a web application sitting on top of it that generates queries (generates SQL code and returns counts). Basically you generated SQL queries graphically through an AJAX UI...and it runs pretty nice performance-wise.

This is roughly a 400 GB database. I've been looking at Hadoop and thinking about using it instead of Oracle (have my app generate HIVE query code), BUT it seems to me like it's an overkill....isn't hadoop targeted more towards tens of terabytes to petabyte scale datasets? Is it suitable in place of a relational database (like Oracle) for the task I'm doing??

like image 345
wsb3383 Avatar asked Sep 21 '10 17:09

wsb3383


2 Answers

It's hard to say without more details. However, in my experience, if all your data is in SQL than your SQL engine probably has more optimizations than simple map reduce has.

Without knowing what you want to crunch exactly and the state of the data, then unless you are hitting some major edge case with your environment, you probably would have more trouble setting up and using hadoop in your case and it would probably wouldn't end up taking a lot longer.

If all your data in Oracle, it's probably all parsed, indexed, and hopefully somewhat regular. If the crunching exists entirely in that domain (and you are not trying to work with something uncommon like massive BLOBs or other weird situtations), most of the time its better letting your database engine handle it.

Moral of the story:

Hadoop is really awesome but it's not magic and doesn't make regular old SQL faster!

like image 77
Zac Bowling Avatar answered Sep 24 '22 13:09

Zac Bowling


isn't hadoop targeted more towards tens of terabytes to petabyte scale datasets?

Maybe. But it's suitable to a wide variety of problems. It's also suitable for very small datasets where the Hadoop "functional" style of programming helps.

SQL is not the perfect query language. It's just widely-adopted.

Is it suitable in place of a relational database (like Oracle) for the task I'm doing??

Without too many requirements, it's almost impossible to tell. However, if you're doing transactional stuff with lots of inserts, updates and deletes, then SQL RDBMS is probably necessary.

If you're not doing complex transactions; if you're doing bulk loads and bulk queries, then the database is getting in your way. The file system will be faster. And often simpler.

like image 41
S.Lott Avatar answered Sep 22 '22 13:09

S.Lott