排序问题
来源:13-11 更换网站代码调整与讲解(必看)

幕布斯1536738
2020-10-16
老师好,这个爬虫我已经自己实现了,也跑通了,但是又个疑惑
为什么把go函数里的__ranking函数注释掉,代码也能跑起来,而且也实现了排序?
代码如下:
import re
from urllib import request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
class Spider():
url = 'https://www.huya.com/g/lol'
root_pattern = '<li class="game-live-item" gid="1" data-lp="[\d]*">([\s\S]*?)</li>'
vedio_pattern = 'target="_blank">([\s\S]*?)</a>'
name_pattern = '<i class="nick" title="([\s\S]*?)">[\s\S]*?</i>'
number_pattern = '<i class="js-num">([\s\S]*?)</i>'
def __fetch_content(self):
r = request.urlopen(Spider.url)
html = r.read()
# print(type(html))
# print(html[0])
html = str(html,encoding='utf-8')
# print(type(html))
# print(html[0])
return html
def __analysis(self,html):
root_html = re.findall(Spider.root_pattern,html)
# print(type(root_html))
# print(root_html[0])
anchors = []
for i in root_html:
vedio = re.findall(Spider.vedio_pattern,i)
name = re.findall(Spider.name_pattern,i)
number = re.findall(Spider.number_pattern,i)
anchor = {'name':name,'number':number}
anchor = {'vedio':vedio,'name':name,'number':number}
anchors.append(anchor) #字典anchor 拼接成列表anchors
return anchors
def __refine(self,anchors):
l = lambda anchor: {
'vedio':anchor['vedio'][0].strip(),
'name':anchor['name'][0].strip(),
'number':anchor['number'][0]
}
return map(l,anchors)
def __ranking(self,anchors):
anchors = sorted(anchors, key=self.__sort_seed, reverse=True)
return anchors
def __sort_seed(self,anchor):
seed = re.findall('[1-9]\d*\.?\d*', anchor['number'])
number = float(seed[0])
if '万' in anchor['number']:
number *= 10000
return number
def __show(self,anchors):
# print(type(anchors))
for rank in range(0,len(anchors)):
print('排名:'+str(rank+1)+'--'+'观看:'+anchors[rank]['number']+'--'+'播主:'+anchors[rank]['name']+'--'+'视频:'+anchors[rank]['vedio'])
return anchors
def go(self):
html = self.__fetch_content()
anchors = self.__analysis(html)
# print(type(anchors))
anchors = list(self.__refine(anchors))
# anchors = self.__ranking(anchors)
self.__show(anchors)
spider = Spider()
spider.go()
1回答
-
mewolmewo
2020-11-15
额,虎牙的lol直播界面默认排序就是按人气排下来的,所以你不排序结果是一样的
00
相似问题