Skip to content Skip to sidebar Skip to footer

Why I Got Messy Characters While Opening Url Using Urllib2?

Here's my code, you guys can also test it out. I always get messed-up characters instead of page source. Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv

Solution 1:

As Bruce already suggested, it seems to be a problem with compression. The server returns gzip compressed content, but urllib2 does not support automatic gzip compression. In fact, the server is misbehaving in this case as far as I know: it should only compress the content if an Accept-encoding: gzip header is present (which you either provide yourself, or is automatically added by your client if it supports it).

So: either use a library that supports it automatically, like httplib2 (which I've tested with the page in question, and it works), or decompress yourself (see the answer to this SO question for how to do it, note that in the question the headers returned by the server are checked to see if the content is gzip compressed)

Solution 2:

You make your request with a user agent which supports on the fly compression. Are you sure that the output is not gzip compressed ? Try running it through zlib module and/or printing headers

Post a Comment for "Why I Got Messy Characters While Opening Url Using Urllib2?"