Why I Got Messy Characters While Opening Url Using Urllib2?
Solution 1:
As Bruce already suggested, it seems to be a problem with compression. The server returns gzip compressed content, but urllib2
does not support automatic gzip compression. In fact, the server is misbehaving in this case as far as I know: it should only compress the content if an Accept-encoding: gzip
header is present (which you either provide yourself, or is automatically added by your client if it supports it).
So: either use a library that supports it automatically, like httplib2 (which I've tested with the page in question, and it works), or decompress yourself (see the answer to this SO question for how to do it, note that in the question the headers returned by the server are checked to see if the content is gzip compressed)
Solution 2:
You make your request with a user agent which supports on the fly compression. Are you sure that the output is not gzip compressed ? Try running it through zlib module and/or printing headers
Post a Comment for "Why I Got Messy Characters While Opening Url Using Urllib2?"