提取自虎牙的守望先锋的主播名字,不知道正则怎么写
来源:13-9 数据精炼
你是我的河豚鱼
2019-06-02
import re
from urllib import request
断电调试
class Spider():
url = 'https://www.huya.com/g/overwatch’
root_pattern = '([sS]?)'
name_pattern = '<i class=“nick” ([sS]?)'
number_pattern = ‘([sS]*?)’
def __fetch_content(self):
r = request.urlopen(Spider.url)
htmls = r.read()
htmls = str(htmls, encoding='utf-8')
return htmls
def __analysis(self, htmls):
root_html = re.findall(Spider.root_pattern, htmls)
anchors = []
for html in root_html:
name = re.findall(Spider.name_pattern, html)
number = re.findall(Spider.number_pattern, html)
anchor = {'name':name, 'number':number}
anchors.append(anchor)
print(anchors[0])
a = 1
return anchors
def __refine(self, anchors):
pass
def go(self):
htmls = self.__fetch_content()
anchors = self.__analysis(htmls)
self.__refine(anchors)
spider = Spider()
spider.go()
以上是我的代码。
返回结果是:{‘name’: [’ title=“老李Jamlee”>老李Jamlee’], ‘number’: [‘6.8万’]}
不知道怎么进一步名字,请老师执教。
<span class="avatar fl">
<img data-original="https://huyaimg.msstatic.com/avatar/1017/11/39995b2e1c81c5.jpg" src="//a.msstatic.com/huya/main/assets/img/default/84x84.jpg" onerror="this.onerrostatic.com/huya/main/assets/img/default/84x84.jpg';" alt="老李Jamlee" title="老李Jamlee"> <i class="nick" title="老李Jamlee">老李Jamlee</i> </span>
<span class="num"><i class="num-icon"></i><i class="js-num">5.2万</i></sp
</span>
写回答
1回答
-
7七月
2019-06-03
你这得到的是个字典?如果是个字典,把它转换成python dict,就可以读取到名字了呀。
032019-06-03
相似问题