Where can I find a list of Hebrew stop words? [closed]

2 Answers

function getStopWords(){
return array(
'אני',
'את',
'אתה',
'אנחנו',
'אתן',
'אתם',
'הם',
'הן',
'היא',
'הוא',
'שלי',
'שלו',
'שלך',
'שלה',
'שלנו',
'שלכם',
'שלכן',
'שלהם',
'שלהן',
'לי',
'לו',
'לה',
'לנו',
'לכם',
'לכן',
'להם',
'להן',
'אותה',
'אותו',
'זה',
'זאת',
'אלה',
'אלו',
'תחת',
'מתחת',
'מעל',
'בין',
'עם',
'עד',
'נגר',
'על',
'אל',
'מול',
'של',
'אצל',
'כמו',
'אחר',
'אותו',
'בלי',
'לפני',
'אחרי',
'מאחורי',
'עלי',
'עליו',
'עליה',
'עליך',
'עלינו',
'עליכם',
'לעיכן',
'עליהם',
'עליהן',
'כל',
'כולם',
'כולן',
'כך',
'ככה',
'כזה',
'זה',
'זות',
'אותי',
'אותה',
'אותם',
'אותך',
'אותו',
'אותן',
'אותנו',
'ואת',
'את',
'אתכם',
'אתכן',
'איתי',
'איתו',
'איתך',
'איתה',
'איתם',
'איתן',
'איתנו',
'איתכם',
'איתכן',
'יהיה',
'תהיה',
'היתי',
'היתה',
'היה',
'להיות',
'עצמי',
'עצמו',
'עצמה',
'עצמם',
'עצמן',
'עצמנו',
'עצמהם',
'עצמהן',
'מי',
'מה',
'איפה',
'היכן',
'במקום שבו',
'אם',
'לאן',
'למקום שבו',
'מקום בו',
'איזה',
'מהיכן',
'איך',
'כיצד',
'באיזו מידה',
'מתי',
'בשעה ש',
'כאשר',
'כש',
'למרות',
'לפני',
'אחרי',
'מאיזו סיבה',
'הסיבה שבגללה',
'למה',
'מדוע',
'לאיזו תכלית',
'כי',
'יש',
'אין',
'אך',
'מנין',
'מאין',
'מאיפה',
'יכל',
'יכלה',
'יכלו',
'יכול',
'יכולה',
'יכולים',
'יכולות',
'יוכלו',
'יוכל',
'מסוגל',
'לא',
'רק',
'אולי',
'אין',
'לאו',
'אי',
'כלל',
'נגד',
'אם',
'עם',
'אל',
'אלה',
'אלו',
'אף',
'על',
'מעל',
'מתחת',
'מצד',
'בשביל',
'לבין',
'באמצע',
'בתוך',
'דרך',
'מבעד',
'באמצעות',
'למעלה',
'למטה',
'מחוץ',
'מן',
'לעבר',
'מכאן',
'כאן',
'הנה',
'הרי',
'פה',
'שם',
'אך',
'ברם',
'שוב',
'אבל',
'מבלי',
'בלי',
'מלבד',
'רק',
'בגלל',
'מכיוון',
'עד',
'אשר',
'ואילו',
'למרות',
'אס',
'כמו',
'כפי',
'אז',
'אחרי',
'כן',
'לכן',
'לפיכך',
'מאד',
'עז',
'מעט',
'מעטים',
'במידה',
'שוב',
'יותר',
'מדי',
'גם',
'כן',
'נו',
'אחר',
'אחרת',
'אחרים',
'אחרות',
'אשר',
'או'
);
}

answered Oct 20 '22 14:10

Itay Moav -Malimovka

I doubt that there is one openly available, but as a simple approximation, you could create a list of very frequent tokens in a reasonably large corpus. Then, depending on your need, you can use the list as such, or filter it manually, or do some trial-and-error with your algorithm to see how it works.

Here's a list of the 100 most common tokens from a pretty large news corpus I have. Note that for my purposes, I counted various punctuation characters as tokens. The number "1" represents all the numeric tokens, hence its high position in the list.

You would probably be aware of that stop list is a problematic concept in Hebrew due to the morphology & orthography - some of the useful ones are just attached to the words.

answered Oct 20 '22 16:10

daphshez

Related questions
                            
                                HTML CSS Styling for Hebrew Niqqud
                            
                                Right to Left UI in iPhone (Hebrew)
                            
                                Hebrew dictionary for PostgreSQL on Heroku?
                            
                                Using hebrew with the android emulator
                            
                                hebrew appears as question marks in netbeans
                            
                                Android Hebrew RTL String With Numeric Value Flipped
                            
                                The token of raku grammar doesn't hit the first occurences of a document but hits the similar following occurences
                            
                                jQuery doesn't display Hebrew
                            
                                C# Encoding.Converting Latin to Hebrew
                            
                                iOS RTL - improperly displaying English inside RTL string
                            
                                Android Hebrew (RTL) Integration
                            
                                Calculating Hebrew date in Python
                            
                                Writing text with diacritic ("nikud", vocalization marks) using PIL (Python Imaging Library)
                            
                                Is there any way to write Hebrew in the Windows Console?
                            
                                HTML5 Canvas fillText with Right-to-Left string
                            
                                How to tell if a string contains characters in Hebrew using PHP?
                            
                                Converting windows-1255 to UTF-8 in PHP 5
                            
                                How can I implement iOS with RTL for Arabic Hebrew?
                            
                                Matlab in Linux (Ubuntu 11.10) doesn't display Unicode (Hebrew) in plot figure
                            
                                Chrome error for font family: "Invalid property value"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Where can I find a list of Hebrew stop words? [closed]

Tags:

stop-words

hebrew

Itay Moav -Malimovka

People also ask

2 Answers

Itay Moav -Malimovka

daphshez

Recent Activity

Donate For Us