How can I find the singular in the plural when some letters change? Following situation: <ul> <li>The German word <code>Schließfach</code> is a lockbox.</li> <li>The plural is <code>Schließfächer.</code> </li> </ul> As you see, the letter <code>a</code> has changed in <code>ä</code>. For this reason, the first word is not a substring of the second one anymore, they are "regex-technically" different. Maybe I'm not in the right corner with my chosen tags below. Maybe Regex is not the right tool for me. I've seen <code>naturaljs</code> (<code>natural.NounIflector()</code>) provides this functionality out of the box for English words. Maybe there are also solutions for the German language in the same way? What is the best approach, how can I find singular in the plural in German?

You can use a stemmer (which is in fact a lemmatizer) from the nlp.js library, which has models for 40 languages. <pre class="prettyprint"><code>const { StemmerDe } = require('@nlpjs/lang-de'); const stemmer = new StemmerDe(); console.log(stemmer.stemWord('Schließfach')); console.log(stemmer.stemWord('Schließfächer')); </code></pre>

How to find singular in the plural when some letters change? What is the best approach?

Tags:

javascript

diacritics

nlp

How can I find the singular in the plural when some letters change?

Following situation:

The German word Schließfach is a lockbox.
The plural is Schließfächer.

As you see, the letter a has changed in ä. For this reason, the first word is not a substring of the second one anymore, they are "regex-technically" different.

Maybe I'm not in the right corner with my chosen tags below. Maybe Regex is not the right tool for me. I've seen naturaljs (natural.NounIflector()) provides this functionality out of the box for English words. Maybe there are also solutions for the German language in the same way?

What is the best approach, how can I find singular in the plural in German?

961

asked Nov 12 '20 14:11

Lonely

2 Answers

I once had to build a text processor that parsed many languages, including very casual to very formal. One of the things to identify was if certain words were related (like a noun in the title which was related to a list of things - sometimes labeled with a plural form.)

IIRC, 70-90% of singular & plural word forms across all languages we supported had a "Levenshtein distance" of less than 3 or 4. (Eventually several dictionaries were added to improve accuracy because "distance" alone produced many false positives.) Another interesting find was that the longer the words, the more likely a distance of 3 or fewer meant a relationship in meaning.

Here's an example of the libraries we used:

const fastLevenshtein = require('fast-levenshtein');

console.log('Deburred Distances:')
console.log('Score 1:', fastLevenshtein.get('Schließfächer', 'Schließfach'));
// -> 3
console.log('Score 2:', fastLevenshtein.get('Blumtach', 'Blumtächer'));
// -> 3
console.log('Score 3:', fastLevenshtein.get('schließfächer', 'Schliessfaech'));
// -> 7
console.log('Score 4:', fastLevenshtein.get('not-it', 'Schliessfaech'));
// -> 12
console.log('Score 5:', fastLevenshtein.get('not-it', 'Schiesse'));
// -> 8


/**
 * Additional strategy for dealing with other various languages:
 *   "Deburr" the strings to omit diacritics before checking the distance:
 */

const deburr = require('lodash.deburr');
console.log('Deburred Distances:')
console.log('Score 1:', deburr(fastLevenshtein.get('Schließfächer', 'Schließfach')));
// -> 3
console.log('Score 2:', deburr(fastLevenshtein.get('Blumtach', 'Blumtächer')));
// -> 3
console.log('Score 3:', deburr(fastLevenshtein.get('schließfächer', 'Schliessfaech')));
// -> 7


// Same in this case, but helpful in other similar use cases.

157

answered Nov 15 '22 08:11

Dan Levy

You can use a stemmer (which is in fact a lemmatizer) from the nlp.js library, which has models for 40 languages.

const { StemmerDe } = require('@nlpjs/lang-de');

const stemmer = new StemmerDe();
console.log(stemmer.stemWord('Schließfach'));
console.log(stemmer.stemWord('Schließfächer'));

answered Nov 15 '22 09:11

Jindřich

Related questions
                            
                                Check API response data with interface in Typescript Angular
                            
                                ReactJS: useEffect is not run when the url changes
                            
                                Should I use console.error() or throw new Error()
                            
                                React - Material UI Typography how to break long string to multiple lines
                            
                                Typescript - No index signature with a parameter of type 'string'
                            
                                How to avoid multiple token refresh requests when making simultaneous API requests with an expired token
                            
                                Puppeteer page.click works, but page.evaluate + document click doesn't work
                            
                                How to implement Google Maps search box in a React application
                            
                                Convert array of objects to object of key-value pairs
                            
                                React does not recognize the prop passed to a styled-component within Material UI
                            
                                How to make multiple API calls with a delay between each in Node.js
                            
                                How to adjust div width with the siz of text inside it [duplicate]
                            
                                Update javascript object with another object, but only existing keys
                            
                                Pod install error after upgrading to React Native 0.63.0
                            
                                React Material UI Autocomplete using React Hook Forms issue
                            
                                Close Modal Popup using Esc key on keyboard
                            
                                Better way of updating several states using react hook inside if statement?
                            
                                TypeError: state is not iterable on react and redux
                            
                                How to change scroll behavior while going back in next js?
                            
                                How to mock Next.js Image component in Storybook?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With