Skip to content Skip to sidebar Skip to footer

Python 2.7.2: Plistlib With Itunes Xml

I'm reading an itunes generated xml playlist with plistib. The xml has a utf8 header. When I read the xml with plistib, I get both unicode (e.g., 'Name': u'Don\u2019t You Remember

Solution 1:

Wow this is a really weird behaviour. I would even say that this non-uniform behaviour is a bug in the 2.X implementation of the plistlib. The plistlib in Python 3 always returns unicode strings which is much better.

But you have to live with it :) So the answer to your question is yes. You should protect yourself always when reading a string from a plist

def safe_unicode(s):
    if isinstance(s, unicode):
        return s
    return s.decode('utf-8', errors='replace')

value = safe_unicode(info['Name'])

I added the errors='replace' just in case the string is not utf-8 encoded. You'll get a bunch of \ufffd characters if it cannot be decoded. If you rather get an exception just leave it out and use e.decode('utf-8').

Update:

When I tried with ElementTree:

from xml.etree import ElementTree as et
tree = et.parse('test.plist')
map(lambda x: x.text, tree.findall('dict/dict/dict')[1].findall('string'))

Which gave me:

[u'Don\u2019t You Remember',
 'Adele',
 '21',
 'Pop',
 'MPEG audio file',
 '7130C888606FB153',
 'File',
 'file://localhost/D:/music/Adele/21/04%20-%20Don%E2%80%99t%20You%20Remember.mp3']

So there are unicode and byte string mixed :-/


Post a Comment for "Python 2.7.2: Plistlib With Itunes Xml"