Using norwegian letters æøå in python

Tags:

I'm learning python and PyGTK now, and have created a simple Music Organizer. http://pastebin.com/m2b596852 But when it edits songs with the Norwegian letters æ, ø, and å it's just changing them to a weird character.

So is there any good way of opening or encode the names into utf-8 characters?

Two relevant places from the above code:

Read info from a file:

def __parse(self, filename):
    "parse ID3v1.0 tags from MP3 file"
    self.clear()
    self['artist'] = 'Unknown'
    self['title'] = 'Unknown'
    try:
        fsock = open(filename, "rb", 0)
        try:
            fsock.seek(-128, 2)
            tagdata = fsock.read(128)
        finally:
            fsock.close()
        if tagdata[:3] == 'TAG':
            for tag, (start, end, parseFunc) in self.tagDataMap.items():
                self[tag] = parseFunc(tagdata[start:end])
    except IOError:
        pass

Print to sys.stdout info:

for info in files:
    try:
        os.rename(info['name'], 
            os.path.join(self.dir, info['artist'])+' - '+info['title']+'.mp3')

        print 'From: '+ info['name'].replace(os.path.join(self.dir, ''), '')
        print 'To:   '+ info['artist'] +' - '+info['title']+'.mp3'
        print
        self.progressbar.set_fraction(i/num)
        self.progressbar.set_text('File %d of %d' % (i, num))
        i += 1
    except IOError:
        print 'Rename fail'

904

asked Mar 19 '09 22:03

ThoKra

3 Answers

You want to start by decoding the input FROM the charset it is in TO utf-8 (in Python, encode means "take it from unicode/utf-8 to some other charset").

Some googling suggests the Norwegian charset is plain-ole 'iso-8859-1'... I hope someone can correct me if I'm wrong on this detail. Regardless, whatever the name of the charset in the following example:

tagdata[start:end].decode('iso-8859-1')

In a real-world app, I realize you can't guarantee that the input is norwegian, or any other charset. In this case, you will probably want to proceed through a series of likely charsets to see which you can convert successfully. Both SO and Google have some suggestions on algorithms for doing this effectively in Python. It sounds scarier than it really is.

127

answered Sep 22 '22 23:09

Jarret Hardie

You'd need to convert the bytestrings you read from the file into Unicode character strings. Looking at your code, I would do this in the parsing function, i.e. replace stripnulls with something like this

def stripnulls_and_decode(data):
    return codecs.utf_8_decode(data.replace("\00", "")).strip()

Note that this will only work if the strings in the file are in fact encoded in UTF-8 - if they're in a different encoding, you'd have to use the corresponding decoding function from the codecs module.

answered Sep 22 '22 23:09

David Z

I don't know what encodings are used for mp3 tags but if you are sure that it is UTF-8 then:

 tagdata[start:end].decode("utf-8")

The line # -*- coding: utf-8 -*- defines your source code encoding and doesn't define encoding used to read from or write to files.

answered Sep 24 '22 23:09

jfs

Related questions
                            
                                Does Python have a basename function that is equal to Unix basename?
                            
                                Capture makes remaining patterns unreachable
                            
                                Exclude folder from mypy checking
                            
                                Jupyter Notebook Python Error while Importing Spacy : No module named click._bashcomplete
                            
                                How to check all the folder inside files and subfolder inside files have particular string present
                            
                                How to split a string into numbers and characters [duplicate]
                            
                                Transforming a list of points in a "rank" of indexes
                            
                                The simplest way to check for NaNs in columns (R)?
                            
                                Python and POST data
                            
                                How can I join a list into a string (caveat)?
                            
                                How close are development webservers to production webservers?
                            
                                "Unknown column 'user_id' error in django view
                            
                                How to determine if a page is being redirected
                            
                                Scaling the y-axis with Matplotlib in Python
                            
                                Unit testing and mocking email sender in Python with Google AppEngine
                            
                                Where to get/How to build Windows binary of mod_wsgi with python 3.0 support?
                            
                                How will Python and Ruby applications be affected by .NET?
                            
                                How to determine number of files on a drive with Python?
                            
                                Why doesn't Python release file handles after calling file.close()?
                            
                                Python build/release system

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using norwegian letters æøå in python

Tags:

python

utf-8