Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to generate numerical concept hierarchy

I have a couple of numerical datasets that I need to create a concept hierarchy for. For now, I have been doing this manually by observing the data (and a corresponding linechart). Based on my intuition, I created some acceptable hierarchies.

This seems like a task that can be automated. Does anyone know if there is an algorithm to generate a concept hierarchy for numerical data?


To give an example, I have the following dataset:

Bangladesh     521
Brazil         8295
Burma          446
China          3259
Congo          2952
Egypt          2162
Ethiopia       333
France         46037
Germany        44729
India          1017
Indonesia      2239
Iran           4600
Italy          38996
Japan          38457
Mexico         10200
Nigeria        1401
Pakistan       1022
Philippines    1845
Russia         11807
South Africa   5685
Thailand       4116
Turkey         10479
UK             43734
US             47440
Vietnam        1042

alt text

for which I created the following hierarchy:

  • LOWEST ( < 1000)
  • LOW (1000 - 2500)
  • MEDIUM (2501 - 7500)
  • HIGH (7501 - 30000)
  • HIGHEST ( > 30000)
like image 617
Christophe Herreman Avatar asked Mar 25 '10 16:03

Christophe Herreman


People also ask

What is concept hierarchy generation?

A conceptual hierarchy includes a set of nodes organized in a tree, where the nodes define values of an attribute known as concepts. A specific node, “ANY”, is constrained for the root of the tree. A number is created to the level of each node in a conceptual hierarchy.

How concept hierarchy of categorical data can be generated automatically?

Based on this observation, a concept hierarchy can be automatically generated based on the number of distinct values per attribute in the given attribute set. The attribute with the most distinct values is placed at the lowest level of the hierarchy.

What are the types of concept hierarchy in data mining?

Types of concept hierarchyIn binning, first sort data and partition into (equi-depth) bins then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.

How many methods are there for the generation of concept hierarchies for nominal data?

We study four methods for the generation of concept hierarchies for nominal data, as follows. 1. Specification of a partial ordering of attributes explicitly at the schema level by users or experts: Concept hierarchies for nominal attributes or dimensions typically involve a group of attributes.


1 Answers

Maybe you need a clustering algorithm?

Quoting from the link:

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields

like image 200
Eli Bendersky Avatar answered Oct 19 '22 08:10

Eli Bendersky