Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

News clustering

How does Google News and Techmeme cluster news items that are similar? Are there any well know algorithm that is used to achieve this?

Appreciate your help.

Thanks in advance.

like image 680
niraj Avatar asked Apr 24 '09 05:04

niraj


People also ask

What is an example of clustering?

Example 1: Retail Marketing Retail companies often use clustering to identify groups of households that are similar to each other. For example, a retail company may collect the following information on households: Household income. Household size.

What are the two main types of clustering methods?

There are two different types of clustering, which are hierarchical and non-hierarchical methods. Non-hierarchical Clustering In this method, the dataset containing N objects is divided into M clusters. In business intelligence, the most widely used non-hierarchical clustering technique is K-means.

What are the three steps of cluster analysis?

The hierarchical cluster analysis follows three basic steps: 1) calculate the distances, 2) link the clusters, and 3) choose a solution by selecting the right number of clusters.


1 Answers

One fairly common way to cluster text based on content is to use Principle Component Analysis on the word vectors (a vector of n dimensions where each possible word represents one dimension and the magnitude in each direction, for each vector, is the number occurrences of the word in that particular article), followed by just a simple clustering such as K-Means.

like image 72
maxaposteriori Avatar answered Nov 05 '22 10:11

maxaposteriori