Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between classification and clustering in data mining? [closed]

People also ask

What is difference between clustering and classification in data mining?

Classification and clustering are techniques used in data mining to analyze collected data. Classification is used to label data, while clustering is used to group similar data instances together.

What is the difference between classification and clustering with examples?

Classification examples are Logistic regression, Naive Bayes classifier, Support vector machines, etc. Whereas clustering examples are k-means clustering algorithm, Fuzzy c-means clustering algorithm, Gaussian (EM) clustering algorithm, etc.

What are the main differences between clustering and classification in terms of objectives and outcomes )?

Generally, clustering only consists of a single phase (grouping) while classification has two stages, training (model learns from training data set) and testing (target class is predicted).


In general, in classification you have a set of predefined classes and want to know which class a new object belongs to.

Clustering tries to group a set of objects and find whether there is some relationship between the objects.

In the context of machine learning, classification is supervised learning and clustering is unsupervised learning.

Also have a look at Classification and Clustering at Wikipedia.


Please read the following information:

enter image description here

enter image description here enter image description here


If you have asked this question to any data mining or machine learning persons they will use the terms supervised learning and unsupervised learning to explain you the difference between clustering and classification. So let me first explain you about the key word supervised and unsupervised.

Supervised learning: suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place. suppose the fruits are apple,banana,cherry, and grape. so you already know from your previous work that, the shape of each and every fruit so it is easy to arrange the same type of fruits at one place. here your previous work is called as trained data in data mining. so you already learn the things from your trained data, This is because of you have a response variable which says you that if some fruit have so and so features it is grape, like that for each and every fruit.

This type of data you will get from the trained data. This type of learning is called as supervised learning. This type solving problem comes under Classification. So you already learn the things so you can do you job confidently.

unsupervised : suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place.

This time you don't know any thing about that fruits, you are first time seeing these fruits so how will you arrange the same type of fruits.

What you will do first is you take on the fruit and you will select any physical character of that particular fruit. suppose you taken color.

Then you will arrange them based on the color, then the groups will be some thing like this. RED COLOR GROUP: apples & cherry fruits. GREEN COLOR GROUP: bananas & grapes. so now you will take another physical character as size, so now the groups will be some thing like this. RED COLOR AND BIG SIZE: apple. RED COLOR AND SMALL SIZE: cherry fruits. GREEN COLOR AND BIG SIZE: bananas. GREEN COLOR AND SMALL SIZE: grapes. job done happy ending.

here you didn't learn any thing before ,means no train data and no response variable. This type of learning is known unsupervised learning. clustering comes under unsupervised learning.


+Classification: you are given some new data, you have to set new label for them.

For example, a company wants to classify their prospect customers. When a new customer comes, they have to determine if this is a customer who is going to buy their products or not.

+Clustering: you're given a set of history transactions which recorded who bought what.

By using clustering techniques, you can tell the segmentation of your customers.