I need a large data (more than 10GB) to run Hadoop demo. Anybody known where I can download it. Please let me know.

Tom White mentioned about a sample weather data set in his Book(Hadoop: the definitive guide). <code>http://hadoopbook.com/code.html</code> Data is available for more than 100 years. I used <code>wget</code> in linux to pull the data. For the year 2007 itself the data size is 27 GB. It is hosted as an <code>FTP</code> link. So, you can download with any FTP utility. ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ For complete details please check my blog: http://myjourneythroughhadoop.blogspot.in/2013/07/how-to-download-weather-data-for-your.html

Download large data for Hadoop [closed]

2 Answers

I would suggest you downloading million songs Dataset from the following website:

http://labrosa.ee.columbia.edu/millionsong/

The best thing with Millions Songs Dataset is that you can download 1GB (about 10000 songs), 10GB, 50GB or about 300GB dataset to your Hadoop cluster and do whatever test you would want. I love using it and learn a lot using this data set.

To start with you can download dataset start with any one letter from A-Z, which will be range from 1GB to 20GB.. you can also use Infochimp site:

http://www.infochimps.com/collections/million-songs

In one of my following blog I showed how to download 1GB dataset and run Pig scripts:

http://blogs.msdn.com/b/avkashchauhan/archive/2012/04/12/processing-million-songs-dataset-with-pig-scripts-on-apache-hadoop-on-windows-azure.aspx

140

answered Sep 30 '22 01:09

AvkashChauhan

Tom White mentioned about a sample weather data set in his Book(Hadoop: the definitive guide).

http://hadoopbook.com/code.html

Data is available for more than 100 years.

I used wget in linux to pull the data. For the year 2007 itself the data size is 27 GB.

It is hosted as an FTP link. So, you can download with any FTP utility.

ftp://ftp.ncdc.noaa.gov/pub/data/noaa/

For complete details please check my blog:

http://myjourneythroughhadoop.blogspot.in/2013/07/how-to-download-weather-data-for-your.html

answered Sep 30 '22 01:09

Jagadish Talluri

Related questions
                            
                                How to find the size of a HDFS file
                            
                                Save Spark dataframe as dynamic partitioned table in Hive
                            
                                Hadoop 2.2 Installation `.' no such file or directory
                            
                                Just enough Java for Hadoop [closed]
                            
                                Hadoop one Map and multiple Reduce
                            
                                putting a remote file into hadoop without copying it to local disk
                            
                                What is Google's Dremel? How is it different from Mapreduce?
                            
                                Hadoop DistributedCache is deprecated - what is the preferred API?
                            
                                Easiest way to install Python dependencies on Spark executor nodes?
                            
                                Spark Unable to load native-hadoop library for your platform
                            
                                Where HDFS stores files locally by default?
                            
                                Difference between `yarn.scheduler.maximum-allocation-mb` and `yarn.nodemanager.resource.memory-mb`?
                            
                                Spark Scala list folders in directory
                            
                                Loading Data from a .txt file to Table Stored as ORC in Hive
                            
                                When using --negotiate with curl, is a keytab file required?
                            
                                view contents of file in hdfs hadoop
                            
                                List the namenode and datanodes of a cluster from any node?
                            
                                HBase REST Filter ( SingleColumnValueFilter )
                            
                                Why isn't Hadoop implemented using MPI?
                            
                                How do you make a HIVE table out of JSON data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Download large data for Hadoop [closed]

Tags:

download

hadoop

Nevis

People also ask

2 Answers

AvkashChauhan

Jagadish Talluri

Recent Activity

Donate For Us