How to handle utf-8 text with Python 3?

Question

I need to parse various text sources and then print / store it somewhere.

Every time a non ASCII character is encountered, I can't correctly print it as it gets converted to bytes, and I have no idea how to view the correct characters.

(I'm quite new to Python, I come from PHP where I never had any utf-8 issues)

The following is a code example:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import feedparser

url = "http://feeds.bbci.co.uk/japanese/rss.xml"
feeds = feedparser.parse(url)
title = feeds['feed'].get('title').encode('utf-8')

print(title)

file = codecs.open("test.txt", "w", "utf-8")
file.write(str(title))
file.close()

I'd like to print and write in a file the RSS title (BBC Japanese - ホーム) but instead the result is this:

b'BBC Japanese - \xe3\x83\x9b\xe3\x83\xbc\xe3\x83\xa0'

Both on screen and file. Is there a proper way to do this ?

Dean Fenster · Accepted Answer

In python3 bytes and str are two different types - and str is used to represent any type of string (also unicode), when you encode() something, you convert it from it's str representation to it's bytes representation for a specific encoding.

In your case in order to the decoded strings, you just need to remove the encode('utf-8') part:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import feedparser

url = "http://feeds.bbci.co.uk/japanese/rss.xml"
feeds = feedparser.parse(url)
title = feeds['feed'].get('title')

print(title)

file = codecs.open("test.txt", "w", encoding="utf-8")
file.write(title)
file.close()

How to handle utf-8 text with Python 3?

Tags:

python-3.x

character-encoding

utf-8

Omiod

1 Answers

Dean Fenster

Recent Activity

Donate For Us

How to handle utf-8 text with Python 3?

Tags:

python-3.x

character-encoding

utf-8

Omiod

1 Answers

Dean Fenster

Related questions

Recent Activity

Donate For Us