爬虫

来源：13-4 VSCode中调试代码

慕用5005156

2019-06-02

老师，我爬的就是虎牙，而且跟着你的步骤做的,和你的代码一模一样。在fetch_content 函数里html变量有那些标签，但是
root_html = re.findall(Spider.root_pattern,htmls) 这个root_html显示[],只显示了__len__:0. 请问是哪里不对？下面这个是我的源代码，请老师帮我看一下.多谢老师.

import re
from urllib import request

class Spider(): #面向对象构造爬虫
url = 'https://www.huya.com/g/lol’
root_pattern = ‘([\s\S]*?)’

def __fetch_content(self):   #获取网页内容,是一个私有方法
    r = request.urlopen(Spider.url)  #此方法接收网页地址url.
    htmls = r.read()
    htmls = str(htmls, encoding = 'utf-8')
    return htmls

def __analysis(self,htmls):
    root_html = re.findall(Spider.root_pattern,htmls)   
    # print(root_html[0])
    a = 1

def pub(self):               #定义一个入口方法
    htmls = self.__fetch_content()
    self.__analysis(htmls)

spider = Spider()
spider.pub()

写回答

1回答