那位大佬肯抽出一点宝贵的时间给指点指点,为什么这个anchors打印出来主播名称和人数是分开的

首页课程实战体系课手记专栏慕课教程

那位大佬肯抽出一点宝贵的时间给指点指点,为什么这个anchors打印出来主播名称和人数是分开的

来源：13-7 正则分析HTML

哦呀v度

2019-06-18

from urllib import request
import re

class Spider():
url = "http://www.yy.com/game/"
root_pattern = '

([\s\S]

?)'
name_pattern = '([\s\S]?)'
number_pattern = '([\s\S]*?)'
def __fetch_content(self):
r = request.urlopen(Spider.url)
htmls = r.read()
htmls = str(htmls,encoding=“utf-8”)
return htmls

    a = 1

def __analysis(self,htmls):
    root_html = re.findall(Spider.root_pattern,htmls)
    anchors = []
    for html in root_html:
        name = re.findall(Spider.name_pattern,html)
        number = re.findall(Spider.number_pattern,html)
        anchor = {"主播名称":name,"观看人数":number}
        anchors.append(anchor)
    print(anchors)

    

    


def go(self):
    htmls = self.__fetch_content()
    self.__analysis(htmls)

spider = Spider()
spider.go()

写回答

1回答

7七月

7七月

2019-06-19

这个怕只能自己调试下呀，每个网站的html不太一样，所以解析出来的也略有不同。

0

3

哦呀v度

回复

7七月

感谢老师,我昨天晚上琢磨了下，可能想出来是什么原因了谢谢

2019-06-20

共3条回复

Python3.8系统入门+进阶 (程序员必备第二语言)

语法精讲/配套练习+思考题/原生爬虫实战

14665 学习 · 4483 问题

相似问题

这个打印出来的为什么是1

回答 3

老师，def __sort_seed(self, anchor)里anchor为什么可以用任意变量名被传进来？

回答 3

'hello world'[-3:5]为什么输出的是' '

回答 3

老师我都是按着您的代码打的为什么这里会报错？ list index out of range？是为什么数据精炼之前课程视频的没有问题

回答 2

小兄弟,我要投诉你了,你每一节视频有多时间是讲课的?都是跟别人聊天,我花钱是来学习的,不是来跟你聊天的,希望你尊重别人的宝贵时间.

回答 10

打开慕课网App查看更多内容