Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

News Article Data Sets [closed]

I am doing a project in news classification. Basically the system will classifying news articles based on the pre-defined topic (e.g. sports, politic, international). To build the system, I need free data sets for training the system.

So far, after few hours googling and links from here the only suitable data sets I could find is this. While this will hopefully enough, I think I will try to find more.

Note that the data sets I want:

  1. Contains full news articles, not just title
  2. Is in English
  3. In .txt format,not XML or db

Can anybody help me?

like image 788
Hearty Avatar asked Nov 18 '11 14:11

Hearty


People also ask

How do you classify newspaper articles?

Currently, the news articles are classified by hand by the content managers of news websites. But to save time, they can also implement a machine learning model on their websites that read the news headline or the content of the news and classifies the category of the news.

What is data set in data science?

A data set is a collection of related, discrete items of related data that may be accessed individually or in combination or managed as a whole entity. A data set is organized into some type of data structure.


1 Answers

Have you tried to use Reuters21578? It is the most common dataset for text classification. It is formated in SGML, but it is quite simple to parse and transform to a txt format.

like image 180
miguelmalvarez Avatar answered Oct 21 '22 01:10

miguelmalvarez