Free Large datasets to experiment with Hadoop

People also ask

Where can I find huge datasets?

Sources for Finding Large DatasetsPage from the CISER Data Archive at Cornell Institute for Social and Economic Research. 'Find, download, and use datasets that are generated and held by the Federal Government. ' U.S. government website with links to health-related datasets from a variety of health agencies.

Which algorithm is best for large datasets?

the Quick sort algorithm generally is the best for large data sets and long keys.

Where can I get datasets to practice?

Kaggle: Kaggle is the home for everything data science-related. Forum discussions centre on Kaggle competitions, data science troubleshooting, fun data sets, discussions of various machine learning, big data and data science topics and more. It also has an excellent jobs board!

Few points about your question regarding crawling and wikipedia.

You have linked to the wikipedia data dumps and you can use the Cloud9 project from UMD to work with this data in Hadoop.

They have a page on this: Working with Wikipedia

Another datasource to add to the list is:

ClueWeb09 - 1 billion webpages collected between Jan and Feb 09. 5TB Compressed.

Using a crawler to generate data should be posted in a separate question to one about Hadoop/MapReduce I would say.

One obvious source: the Stack Overflow trilogy data dumps. These are freely available under the Creative Commons license.

This is a collection of 189 datasets for machine learning (which is one of the nicest applications for hadoop g): http://archive.ics.uci.edu/ml/datasets.html

It's no log file but maybe you could use the planet file from OpenStreetMap: http://wiki.openstreetmap.org/wiki/Planet.osm

CC licence, about 160 GB (unpacked)

There are also smaller files for each continent: http://wiki.openstreetmap.org/wiki/World

Related questions
                            
                                Android compiled resources - resources.arsc
                            
                                Access resource defined in theme and attrs.xml android
                            
                                How to correctly get image from 'Resources' folder in NetBeans
                            
                                Rails 3 - Restricting formats for action in resource routes
                            
                                Creating styles-v21.xml
                            
                                Memory usage of Docker containers
                            
                                Change value of R.string programmatically
                            
                                Where can I obtain an English dictionary with structured data? [closed]
                            
                                How do I make a PNG resource?
                            
                                Networking with C++ [closed]
                            
                                WebView, add local .CSS file to an HTML page?
                            
                                Access resource files in Android
                            
                                Get path of Android resource
                            
                                Maven (Surefire): copy test resources from src/test/java
                            
                                Using HTML inside resource files
                            
                                OCUnit & NSBundle
                            
                                Using IDisposable object in method that returns IEnumerable<T>
                            
                                Class.getResource and ClassLoader.getSystemResource: is there a reason to prefer one to another?
                            
                                VC++ resources in a static library
                            
                                Rails, get resource path in model [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Free Large datasets to experiment with Hadoop

Tags:

resources

hadoop

opendata

People also ask

Recent Activity

Donate For Us