I want to read JSON file in Python that contains Arabic text but the Arabic text is appear like that:
ط§ظ„ط³ظژط¹ظژط§ط¯ظژط©ظگ ظ„ظژظٹظگط³ظژطھظŒ ط§ظ„طظژطµظŒظˆظژظ„ظژ ط¹ظژظ„ظ‰ظژ
ظ…ط§ظژ ظ„ط§ظ†ظژظ…ظ„ظگظƒظژ ط¨ظژظ„ ظ‡ظگظٹظژ ط£ظ†ظژ ظ†ظژظپظ‡ظŒظ…ظژ
ظˆظژظ†ظگط¯ط±ظژظƒظژ ظ‚ظژظٹظگظ…ط©ظڈ ظ…ظژط§ظ†ظژظ…ظ„ظƒ
How can I read the correct Arabic letters?
import sys
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
print(x.translate(non_bmp_map))
x
is parameter that contains Arabic value from JSON file.
I expected to get this sentence :السَعَادَةِ لَيِسَتٌ الحَصٌوَلَ عَلىَ ماَ لانَملِكَ بَل هِيَ أنَ نَفهٌمَ وَنِدرَكَ قَيِمةُ مَانَملك but I get ط§ظ„ط³ظژط¹ظژط§ط¯ظژط©ظگ ظ„ظژظٹظگط³ظژطھظŒ ط§ظ„طظژطµظŒظˆظژظ„ظژ ط¹ظژظ„ظ‰ظژ ظ…ط§ظژ ظ„ط§ظ†ظژظ…ظ„ظگظƒظژ ط¨ظژظ„ ظ‡ظگظٹظژ ط£ظ†ظژ ظ†ظژظپظ‡ظŒظ…ظژ ظˆظژظ†ظگط¯ط±ظژظƒظژ ظ‚ظژظٹظگظ…ط©ظڈ ظ…ظژط§ظ†ظژظ…ظ„ظƒ
You haven't mentioned if you're using Python 3 or 2. In Python 3, the strings are unicode, by default.
If you use Python 2, use codec
:
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
print repr(line)
Ref: Unicode How-to
It is possible, however, that your input data isn't correctly encoded. In that case, you can try using ftfy
package.
ftfy
implements several heuristics to fix broken/inconsistent unicode encodings. From the docs:
>>> from ftfy import fix_encoding
>>> print(fix_encoding("(ง'⌣')ง"))
(ง'⌣')ง
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With