I am using Python 2.7. I have got the following line (string) from a text file encoded in utf-8:
"تازہ ترین خبروں، بریکنگ نیوز، ویڈیو، آڈیو، فیچر اور تجزیوں کے لیے بی بی سی اردو"
I am using the following code to print it on the screen:
import codecs
filename = codecs.open('file path', 'r', encoding="utf-8")
outputfile = filename.readlines()
print outputfile
It gives the following output:
[u'\ufeff\u062a\u0627\u0632\u06c1 \u062a\u0631\u06cc\u0646 \u062e\u0628\u0631\u0648\u06ba\u060c \u0628\u0631\u06cc\u06a9\u0646\u06af \u0646\u06cc\u0648\u0632\u060c \u0648\u06cc\u0688\u06cc\u0648\u060c \u0622\u0688\u06cc\u0648\u060c \u0641\u06cc\u0686\u0631 \u0627\u0648\u0631 \u062a\u062c\u0632\u06cc\u0648\u06ba \u06a9\u06d2 \u0644\u06cc\u06d2 \u0628\u06cc \u0628\u06cc \u0633\u06cc \u0627\u0631\u062f\u0648 \u06a9\u06cc \u0648\u06cc\u0628']
The purpose is to print the text correctly, and not how to print each line. So, how can I print the string or content of text file correctly in its original form? like:
تازہ ترین خبروں، بریکنگ نیوز، ویڈیو، آڈیو، فیچر اور تجزیوں کے لیے بی بی سی اردو
What you see is just the representation of the string. Since you're printing the list, the one shown is the representation, not the readable form.
You can print it normally, for each lines:
for line in outputfile:
print(line)
Demo:
>>> s = u'\ufeff\u062a\u0627\u0632\u06c1 \u062a\u0631\u06cc\u0646 \u062e\u0628\u0631\u0648\u06ba\u060c \u0628\u0631\u06cc\u06a9\u0646\u06af \u0646\u06cc\u0648\u0632\u060c \u0648\u06cc\u0688\u06cc\u0648\u060c \u0622\u0688\u06cc\u0648\u060c \u0641\u06cc\u0686\u0631 \u0627\u0648\u0631 \u062a\u062c\u0632\u06cc\u0648\u06ba \u06a9\u06d2 \u0644\u06cc\u06d2 \u0628\u06cc \u0628\u06cc \u0633\u06cc \u0627\u0631\u062f\u0648 \u06a9\u06cc \u0648\u06cc\u0628'
>>> print(s)
تازہ ترین خبروں، بریکنگ نیوز، ویڈیو، آڈیو، فیچر اور تجزیوں کے لیے بی بی سی اردو کی ویب
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With