老师这个问题研究一晚上了还是没搞定。

来源：13-8 正则分析获取名字和人数

幕布斯1536738

2020-10-13

用html = str(html,encoding=‘utf-8’)时，报
’utf-8’ codec can’t decode byte 0xd0 in position 1764: invalid continuation byte
如果去掉encoding，运行不报错，但是打印不出内容。代码如下：

import re
from urllib import request

class Spider():
    url = 'http://data.eastmoney.com/bkzj/hy.html'
    root_pattern = '<tbody>[\\s\\S]*?</tbody>'

    def __fetch_content(self):
        r = request.urlopen(Spider.url)
        html = r.read()
        # print(type(html))
        html = str(html,encoding='utf-8')
        # print(html)
        return html

    def __analysis(self,html):
        root_html = re.findall(Spider.root_pattern,html)
        # print(type(root_html))
        print(root_html[1])

    def go(self):
        html = self.__fetch_content()
        self.__analysis(html)
        pass
spider = Spider()
spider.go()

写回答

1回答