Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

small data sets for Hadoop-MapReduce

I am trying to be familiar with Hadoop-MapReduce. After studying theoratical knowledge on this concepts, I want to do practise on them.

However, I could not find small data sets ( up to 3 Gb ) for this technology. Where can I find data sets in order to do practise ?

OR, How can I do practise Hadoop-MapReduce ? In other words, are there any tutorial or website which offers exercise ?

like image 248
user1743323 Avatar asked Dec 07 '22 11:12

user1743323


1 Answers

publicly accessible data sets that you can download and play around with. Below are a few examples.

http://www.netflixprize.com/index— As part of a competition, it released a data set of user ratings to challenge people to develop better recommendation algorithms. The uncompressed data comes at 2 GB+. It contains 100 M+ movie ratings from 480 K users on 17 K movies.

http://aws.amazon.com/publicdatasets/— For example, one of the biological data sets is an annotated human genome data of roughly 550 GB. Under economics you can find data sets, such as the 2000 U.S. Census (approximately 200 GB).

http://boston.lti.cs.cmu.edu/Data/clueweb09/—Carnegie Mellon University’s Language Technologies Institute has released the ClueWeb09 data set to aid large-scale web research. It’s a crawl of a billion web pages in 10 languages. The uncompressed data set takes up 25 TB.

like image 74
saurabh shashank Avatar answered Dec 09 '22 01:12

saurabh shashank