Text classification/categorization algorithm [closed]

Tags:

My objective is to [semi]automatically assign texts to different categories. There's a set of user defined categories and a set of texts for each category. The ideal algorithm should be able to learn from a human-defined classification and then classify new texts automatically. Can anybody suggest such an algorithm and perhaps .NET library that implements ше?

288

asked Aug 27 '10 13:08

Max

1 Answers

Doing this is not trivial. Obviously you can build a dictionary that maps certain keywords to categories. Just finding a keyword would suggest a certain category.

Yet, in natural language text, the keywords would usually not be in their stem form. You would need some morphology tools to find the stem form and use it on the dictionary.

But then somebody could write something like: "This article is not about ...". This would introduce the need for syntax and semantical analysis.

And then you would find that certain keywords can be used in several categories: "band" could be used in musics, Technics, or even handicraft work. You would therefore need an ontology and statistical or other methods to weigh the probability of the category to choose if not definite.

Some of the keywords might not even be easy to fit into an ontology: is mathematician closer to programmer or gardener? But you said in your question that the categories are built by men, so they could also help building the ontology.

Have a look on computational linguistics here and in Wikipedia for further studies.

Now, the more narrow the field your texts are from, the more structured they are, and the smaller the vocabulary, the easier the problem becomes.

Again some keywords for further studies: morphology, syntax analysis, semantics, ontology, computational linguistics, indexing, keywording

145

answered Oct 09 '22 06:10

Ralph M. Rickenbach

Related questions
                            
                                C# CRC implementation
                            
                                testing tic tac toe win condition [duplicate]
                            
                                How are arrays and hash maps constant time in their access?
                            
                                Draw arrow on line algorithm
                            
                                What's the difference between Minimmum Spanning Tree and Travelling Salesman Problems
                            
                                Should I use BFS, DFS for tree traversal or in-order, post -order, pre-order?
                            
                                Mutability in functional programming
                            
                                Fast CRC algorithm?
                            
                                Reversing CRC32
                            
                                Cycle detection in linked list with the Hare and Tortoise approach
                            
                                How come this algorithm in Ruby runs faster than in Parallel'd C#?
                            
                                Smallest enclosing circle, error in the code
                            
                                Time/Space Complexity of Depth First Search
                            
                                How can I generate pseudo-random "readable" strings in Java?
                            
                                The quickest escape from recursion in Java [duplicate]
                            
                                Non biased return a list of n random positive numbers (>=0) so that their sum == total_sum
                            
                                Number of 1s in the two's complement binary representations of integers in a range
                            
                                How to know the repeating decimal in a fraction?
                            
                                Check if the digits in the number are in increasing sequence in python
                            
                                Is there any algorithm for converting 2D video into 3D video?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Text classification/categorization algorithm [closed]

Tags:

algorithm

text-mining

document-classification

Max

People also ask

1 Answers

Ralph M. Rickenbach

Recent Activity

Donate For Us