Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separation and pattern matching techniques

I am new to Artificial Neural Networks.

I am interested in an application like this:

table

I have a significantly large set of objects. Each object has six properties, denoted by P1–P6. Each property has a value which is a symbolic value. In other words, in my example P1–P6 can have a value from the set {A, B, C, D, E, F}. They are not numeric. (Suppose A,B,C,D,E,F are colours; then you will understand my idea.)

Now, there is another property R that I am interested in. Suppose

R = {G1, G2, G3, G4, G5}

I need to train a system for a large set of P1–P6 and the relevant R. Now I want to do the following.

  1. I have an object and I know the values of P1 to P6. I need to find the R (The Group that the object belongs.)

  2. To get a desired R what is the pattern I need to have in P1–P6. As an example given that R = G2 I need to figure out any pattern in P1–P6.

My questions are:

  1. What are the theories/technologies/techniques I should read and learn in order to implement 1 and 2, respectively?

  2. What are the tools/libraries you can recommend to get this simulated/implemented/tested?

like image 276
Chathuranga Chandrasekara Avatar asked Sep 05 '11 09:09

Chathuranga Chandrasekara


3 Answers

The way you described your problem, you need to look up various machine learning techniques. If it were me, I would try and read about k-NN (k Nearest Neighbours) for the classification. When I say classification, I mean getting the R if you know P1-P6. It is a really simple technique and should be helpful here.

As for the other way around, what you basically need is a representative sample of your population. This is I think not so usual, but you could try something like a k-means Clustering. Clustering methods usually determine the class of an object (property R) by themselves, but k-means Clustering is cool in this situation because you need to give it the number of object classes (e.g. different possible values of R), and in the end you get one representative sample.

You definitely shouldn't go for any really complex techniques (like neural networks) in my opinion since your data doesn't have a precise numerical interpretation and the values can't be interpreted gradually.

The recommended tools really depend on your base programming language. There's a great tool called Orange which is Python-based and it's my tool of choice for these kind of things (especially since it is really easy to connect your Python modules with C/C++). If you prefer Java, there's a quite similar tool called Weka that you could use. I think Weka is a little bit better documented, but I don't like Java so I've never tried it out.

Both of these tools have a graphical clickable interface where you could just load your data and get the classification done, play with the parameters and check what kind of output you get using different techniques and different set-ups. Once you decide that you got the results you need (or if you just don't like graphical interfaces) you can also use both of them as libraries of a kind when programming (Python for Orange and Java for Weka) and make the classification a part of a bigger project.

If you look through the documentation of Orange or Weka, I think it will give you a few ideas about what you could actually do with the data you have and when you know a few techniques that seem interesting to you and applicable to the data, maybe you could get more quality comments and info on a few specific methods here than when just searching for a general advice.

like image 76
penelope Avatar answered Oct 23 '22 17:10

penelope


You should check out classification algorithms (a subsection of artificial intelligence), especially the nearest neighbor-algorithms. Your problem may be solved by different techniques, which all have different advantages and disadvantages.

However, I do not know of any method in artificial intelligence, which allows a two-way classification (or in other words, that both implement your prerequisites 1 and 2 simultaneously). As all you want to do so far is having a bidirectional mapping of P1..P6 <=> R, I would suggest to just use a mapping table instead of an artificial intelligence algorithm. An AI would work great if you not exactly know, which of your samples is categorized under A..E in P1..P6.

If you insist on using an AI for it, I'd suggest to first look at a Perceptron. A perceptron consists of input, intermediate and output neurons. For your example, you'd have the input-Neurons P1a..P1e, P2a..P2e, ... and five output neurons R1..R5. After training, you should be able to input P1..P6 and get the appropriate R1..R5 as output.

As for frameworks and technologies, I only know of the Business Intelligence suite for Visual Studio, although there are a lot of other frameworks for AI out there. Since I do not have used any of them (I always coded them myself in C/C++), I can't recommend any.

like image 22
Lars Avatar answered Oct 23 '22 17:10

Lars


It seems like a typical classification problem. In case you really have a lot of data have a look at Apache Mahout which provides distributed implementations of machine learning algorithms. If you need something less complex for prototyping TimBL is a nice alternative.

like image 40
Aspasia Avatar answered Oct 23 '22 17:10

Aspasia