What are the major differences and benefits of Porter and Lancaster Stemming algorithms? [closed]

1 Answers

At the very basics of it, the major difference between the porter and lancaster stemming algorithms is that the lancaster stemmer is significantly more aggressive than the porter stemmer. The three major stemming algorithms in use today are Porter, Snowball(Porter2), and Lancaster (Paice-Husk), with the aggressiveness continuum basically following along those same lines. Porter is the least aggressive algorithm, with the specifics of each algorithm actually being fairly lengthy and technical. Here is a break down for you though:

Porter: Most commonly used stemmer without a doubt, also one of the most gentle stemmers. One of the few stemmers that actually has Java support which is a plus, though it is also the most computationally intensive of the algorithms(Granted not by a very significant margin). It is also the oldest stemming algorithm by a large margin.

Porter2: Nearly universally regarded as an improvement over porter, and for good reason. Porter himself in fact admits that it is better than his original algorithm. Slightly faster computation time than porter, with a fairly large community around it.

Lancaster: Very aggressive stemming algorithm, sometimes to a fault. With porter and snowball, the stemmed representations are usually fairly intuitive to a reader, not so with Lancaster, as many shorter words will become totally obfuscated. The fastest algorithm here, and will reduce your working set of words hugely, but if you want more distinction, not the tool you would want.

Honestly, I feel that Snowball is usually the way to go. There are certain circumstances in which Lancaster will hugely trim down your working set, which can be very useful, however the marginal speed increase over snowball in my opinion is not worth the lack of precision. Porter has the most implementations though and so is usually the default go-to algorithm, but if you can, use snowball.

Snowball - Additional info

Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval.

The Snowball compiler translates a Snowball script into another language - currently ISO C, C#, Go, Java, Javascript, Object Pascal, Python and Rust are supported.

History of the name

Since it effectively provides a ‘suffix STRIPPER GRAMmar’, I had toyed with the idea of calling it ‘strippergram’, but good sense has prevailed, and so it is ‘Snowball’ named as a tribute to SNOBOL, the excellent string handling language of Messrs Farber, Griswold, Poage and Polonsky from the 1960s.
---Martin Porter

Stemmers implemented in the Snowball language are sometimes simply referred to as Snowball stemmers. For example, see the Natural Language Toolkit: nltk.stem.snowball.

122

answered Sep 19 '22 12:09

Slater Victoroff

Related questions
                            
                                Maven cannot resolve dependency for module in same multi-module project
                            
                                Sort maven dependencies in Eclipse
                            
                                why doesn't java send the client certificate during SSL handshake?
                            
                                persistence.xml different transaction-type attributes
                            
                                Java says the year 0 is a leap year but year 0 never existed
                            
                                How to read PDF files using Java? [closed]
                            
                                How to change a package name in Eclipse?
                            
                                How to concatenate a string with the new 1.8 stream API [duplicate]
                            
                                abstract class naming convention [closed]
                            
                                What's the difference between # , % and $ signs in Struts tags?
                            
                                Recursive ConcurrentHashMap.computeIfAbsent() call never terminates. Bug or "feature"?
                            
                                Deprecated createCriteria method in Hibernate 5
                            
                                What is the difference between java and core java?
                            
                                How to use the legacy Apache HTTP client on Android Marshmallow?
                            
                                Eclipse: Frustration with Java 1.7 (unbound library)
                            
                                Deep clone utility recommendation [closed]
                            
                                Jackson: How to add custom property to the JSON without modifying the POJO
                            
                                Watching variables contents in Eclipse IDE
                            
                                Compiler error "archive for required library could not be read" - Spring Tool Suite
                            
                                Java very large heap sizes [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the major differences and benefits of Porter and Lancaster Stemming algorithms? [closed]

Tags:

java

machine-learning

nlp

Adam Hess

People also ask

1 Answers

Snowball - Additional info

History of the name

Slater Victoroff

Recent Activity

Donate For Us