I'm working in Arabic text , I want to remove the Arabic Punctuation Example :
s="أهلاً بالعالم في هذه التجربة ! علامات ،الترقيم ؟ ,? لا .اتذكرها"
I want the output to remove " ؟ ، "
also because when I use:
import string
tr= str.maketrans("","", string.punctuation)
the output was 'أهلاً بالعالم في هذه التجربة علامات ،الترقيم ؟ لا اتذكرها'
The string.punctuation
constant contains only the punctuation characters defined in ASCII, which does not even cover all signs used with the Latin script (eg. "fancy quotes" like «» are missing).
If you don't want to create a list of all punctuation characters yourself (I wouldn't), you can use the Unicode character property to decide if a character is punctuation or not.
The built-in unicodedata
module gives you access to this information:
>>> import unicodedata as ud
>>> for c in 'abc: قيم ؟':
... print((c, ud.category(c))
a Ll
b Ll
c Ll
: Po
Zs
ق Lo
ي Lo
م Lo
Zs
؟ Po
All categories are two-letter codes, like "Ll" for "letter, lowercase" or "Po" for "punctuation, other". All punctuation characters have a category that starts with "P".
You can use this information for filtering out punctuation characters (eg. using a generator expression):
>>> s = "أهلاً بالعالم في هذه التجربة ! علامات ،الترقيم ؟ ,? لا .اتذكرها"
>>> ''.join(c for c in s if not ud.category(c).startswith('P'))
'أهلاً بالعالم في هذه التجربة علامات الترقيم لا اتذكرها'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With