ABOUT
Danny Man
I’m Danny, and let me give you a warm welcome to visit my blog site. I hope you’ll enjoy it. Also, you can visit my HOME PAGE to learn more about me.
SEARCH
解析经过压缩后的网页
在《抓站进行曲》中,我们利用urllib2提供的方法成功得到了网页数据。不过对于数据的处理来说,还有一种情况没有考虑,如果网页内容是被压缩过呢?我们来抓取新浪新闻试试:>>> req = urllib2.urlopen('http://news.sina.com.cn/c/2013-12-13/075828974213.shtml')
>>> data = req.read()
>>> data[1000:1300]
'\xa1j\x1...