I have a couple of numerical datasets that I need to create a concept hierarchy for. For now, I have been doing this manually by observing the data (and a corresponding linechart). Based on my intuition, I created some acceptable hierarchies.
This seems like a task that can be automated. Does anyone know if there is an algorithm to generate a concept hierarchy for numerical data?
To give an example, I have the following dataset:
Bangladesh 521
Brazil 8295
Burma 446
China 3259
Congo 2952
Egypt 2162
Ethiopia 333
France 46037
Germany 44729
India 1017
Indonesia 2239
Iran 4600
Italy 38996
Japan 38457
Mexico 10200
Nigeria 1401
Pakistan 1022
Philippines 1845
Russia 11807
South Africa 5685
Thailand 4116
Turkey 10479
UK 43734
US 47440
Vietnam 1042
for which I created the following hierarchy:
A conceptual hierarchy includes a set of nodes organized in a tree, where the nodes define values of an attribute known as concepts. A specific node, “ANY”, is constrained for the root of the tree. A number is created to the level of each node in a conceptual hierarchy.
Based on this observation, a concept hierarchy can be automatically generated based on the number of distinct values per attribute in the given attribute set. The attribute with the most distinct values is placed at the lowest level of the hierarchy.
Types of concept hierarchyIn binning, first sort data and partition into (equi-depth) bins then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.
We study four methods for the generation of concept hierarchies for nominal data, as follows. 1. Specification of a partial ordering of attributes explicitly at the schema level by users or experts: Concept hierarchies for nominal attributes or dimensions typically involve a group of attributes.
Maybe you need a clustering algorithm?
Quoting from the link:
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With