排序问题

来源:13-11 更换网站代码调整与讲解(必看)

幕布斯1536738

2020-10-16

老师好,这个爬虫我已经自己实现了,也跑通了,但是又个疑惑

为什么把go函数里的__ranking函数注释掉,代码也能跑起来,而且也实现了排序?

代码如下:

import re

from urllib import request

import ssl

ssl._create_default_https_context = ssl._create_unverified_context


class Spider():

url = 'https://www.huya.com/g/lol'

root_pattern = '<li class="game-live-item" gid="1" data-lp="[\d]*">([\s\S]*?)</li>'

vedio_pattern = 'target="_blank">([\s\S]*?)</a>'

name_pattern = '<i class="nick" title="([\s\S]*?)">[\s\S]*?</i>'

number_pattern = '<i class="js-num">([\s\S]*?)</i>'

def __fetch_content(self):

r = request.urlopen(Spider.url)

html = r.read()

# print(type(html))

# print(html[0])

html = str(html,encoding='utf-8')

# print(type(html))

# print(html[0])

return html


def __analysis(self,html):

root_html = re.findall(Spider.root_pattern,html)

# print(type(root_html))

# print(root_html[0])

anchors = []

for i in root_html:

vedio = re.findall(Spider.vedio_pattern,i)

name = re.findall(Spider.name_pattern,i)

number = re.findall(Spider.number_pattern,i)

anchor = {'name':name,'number':number}

anchor = {'vedio':vedio,'name':name,'number':number}

anchors.append(anchor)  #字典anchor 拼接成列表anchors

return anchors


def __refine(self,anchors):

l = lambda anchor: {

'vedio':anchor['vedio'][0].strip(),

'name':anchor['name'][0].strip(),

'number':anchor['number'][0]

}

return map(l,anchors)


def __ranking(self,anchors):

anchors = sorted(anchors, key=self.__sort_seed, reverse=True)

return anchors

def __sort_seed(self,anchor):

seed = re.findall('[1-9]\d*\.?\d*', anchor['number'])

number = float(seed[0])

if '万' in anchor['number']:

number *= 10000

return number


def __show(self,anchors):

# print(type(anchors))

for rank in range(0,len(anchors)):

print('排名:'+str(rank+1)+'--'+'观看:'+anchors[rank]['number']+'--'+'播主:'+anchors[rank]['name']+'--'+'视频:'+anchors[rank]['vedio'])

return anchors


def go(self):

html = self.__fetch_content()

anchors = self.__analysis(html)

# print(type(anchors))

anchors = list(self.__refine(anchors))

# anchors = self.__ranking(anchors)

self.__show(anchors)


spider = Spider()

spider.go()



写回答

1回答

mewolmewo

2020-11-15

额,虎牙的lol直播界面默认排序就是按人气排下来的,所以你不排序结果是一样的

0
0

Python3.8系统入门+进阶 (程序员必备第二语言)

语法精讲/配套练习+思考题/原生爬虫实战

14598 学习 · 4469 问题

查看课程