I have html file called test.html
it has one word בדיקה
.
I open the test.html and print it's content using this block of code:
file = open("test.html", "r") print file.read()
but it prints ??????
, why this happened and how could I fix it?
BTW. when I open text file it works good.
Edit: I'd tried this:
>>> import codecs >>> f = codecs.open("test.html",'r') >>> print f.read() ?????
Any web browser, such as Edge, Firefox, Chrome or Opera, will open and properly display HTM and HTML files.
To view only the source code, press Ctrl + U on your computer's keyboard. Right-click a blank part of the web page and select View source from the pop-up menu that appears.
Fire up Chrome and jump to the webpage you want to view the HTML source code. Right-click the page and click on “View Page Source,” or press Ctrl + U, to see the page's source in a new tab. A new tab opens along with all the HTML for the webpage, completely expanded and unformatted.
Check if the file is saved with a UTF-8 encoding. If that doesn't work, try installing another browser or using Edge/Safari/Internet Explorer or whatever built-in browser you have. It is saved as index.
import codecs f=codecs.open("test.html", 'r') print f.read()
Try something like this.
I encountered this problem today as well. I am using Windows and the system language by default is Chinese. Hence, someone may encounter this Unicode error similarly. Simply add encoding = 'utf-8'
:
with open("test.html", "r", encoding='utf-8') as f: text= f.read()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With