I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong?
import string
exclude = set(string.punctuation)
main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499"
main_text = ''.join(ch for ch in main_text if ch not in exclude)
[output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]"
n = filter(lambda x: x not in string.printable, n)
print n
وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا
I am able to remove punctuations and english text but I lost the space between words. How can I retain each words?
translate() is another method that can be used to remove a character from a string in Python. translate() returns a string after removing the values passed in the table. Also, remember that to remove a character from a string using translate() you have to replace it with None and not "" .
Using the replace() function We can use the replace() function to remove word from string in Python. This function replaces a given substring with the mentioned substring. We can replace a word with an empty character to remove it.
You can save the spaces in your string by using
n = filter(lambda x: True if x==' ' else x not in string.printable , main_text)
or
n = filter(lambda x: x==' ' or x not in string.printable , main_text)
This will check if the character is space, if not then it will check if it is printable.
You can stop it from removing any whitespace as follows:
n = filter(lambda x: x in string.whitespace or x not in string.printable, n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With