How can I decode this string in python?

Question

I downloaded a dataset of facebook messages and it was formatted like this:

f\u00c3\u00b8rste student

It's supposed to be første student but I cant seem to decode it correctly.

I tried:

str = 'f\u00c3\u00b8rste student'
print(str)
# 'fÃ¸rste student'

str = 'f\u00c3\u00b8rste student'
print(str.encode('utf-8')) 
# b'f\xc3\x83\xc2\xb8rste student'

But it did't work.

jwodder · Accepted Answer

To undo whatever encoding foulup has taken place, you first need to convert the characters to the bytes with the same ordinals by encoding in ISO-8859-1 (Latin-1) and then after that decoding as UTF-8:

>>> 'f\u00c3\u00b8rste student'.encode('iso-8859-1').decode('utf-8')
'første student'

How can I decode this string in python?

Tags:

python

unicode

utf

vhflat

1 Answers

jwodder

Recent Activity

Donate For Us

How can I decode this string in python?

Tags:

python

unicode

utf

vhflat

1 Answers

jwodder

Related questions

Recent Activity

Donate For Us