Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove english text from arabic string in python?

Tags:

python

lambda

nlp

I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong?

import string
exclude = set(string.punctuation)

main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499"
main_text = ''.join(ch for ch in main_text if ch not in exclude)
[output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]"
n = filter(lambda x: x not in string.printable, n)
print n
وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا

I am able to remove punctuations and english text but I lost the space between words. How can I retain each words?

like image 556
Anish Avatar asked Apr 02 '15 06:04

Anish


People also ask

How do I remove an English letter from a string in Python?

translate() is another method that can be used to remove a character from a string in Python. translate() returns a string after removing the values passed in the table. Also, remember that to remove a character from a string using translate() you have to replace it with None and not "" .

How do I remove a keyword from a string in Python?

Using the replace() function We can use the replace() function to remove word from string in Python. This function replaces a given substring with the mentioned substring. We can replace a word with an empty character to remove it.


2 Answers

You can save the spaces in your string by using

n = filter(lambda x: True if x==' ' else x not in string.printable , main_text)

or

n = filter(lambda x: x==' ' or x not in string.printable , main_text)

This will check if the character is space, if not then it will check if it is printable.

like image 107
Bhargav Rao Avatar answered Sep 28 '22 09:09

Bhargav Rao


You can stop it from removing any whitespace as follows:

n = filter(lambda x: x in string.whitespace or x not in string.printable, n)
like image 26
agamagarwal Avatar answered Sep 28 '22 10:09

agamagarwal