怎么处理csdn论坛中新出现的置顶帖?
来源:14-11 获取和解析列表页 - 2

JackyBreak
2020-02-14
老师您好,csdn在每个版面都新增了三个置顶帖,我的处理方式如下图:
就是不取这三个置顶帖,让list从第六个tr开始取,经过debug发现list中确实不包含这三个置顶帖了,但是在下面提取topic id的时候却报错:
提示我提取出的id是“J2EE”,也就是最上面的置顶帖的第一个a标签的href:
但是我的tr列表里面已经不包含置顶帖了啊?请问这是什么原因呢?
我的代码如下:
def parse_list(url):
topic_chart = Topic()
res_text = requests.get(url).text
sel = Selector(text=res_text)
all_trs = sel.xpath("//table[@class='forums_tab_table']//tr")[5:]
print(all_trs[0].extract())
for tr in all_trs:
if tr:
if tr.xpath("//td[1]//span/text()").extract():
status = tr.xpath("//td[1]//span/text()").extract()[0]
topic_chart.status = status
if tr.xpath("//td[2]//em/text()").extract():
score = tr.xpath("//td[2]//em/text()").extract()[0]
topic_chart.score = int(score)
if tr.xpath("//td[3]/a/@href").extract():
# try:
url = tr.xpath("//td[3]/a/@href").extract()[0]
topic_url = parse.urljoin(domain, url)
topic_chart.id = int(topic_url.split("/")[-1])
# except:
# topic_url = parse.urljoin(domain, tr.xpath("//td[3]//a[2]/@href").extract()[0])
# topic_chart.id = int(topic_url.split("/")[-1])
if tr.xpath("//td[3]//a/text()").extract():
topic_title = tr.xpath("//td[3]//a/text()").extract()[0]
topic_chart.title = topic_title
if tr.xpath("//td[4]//a/@href").extract():
author_url = parse.urljoin(domain, tr.xpath("//td[4]//a/@href").extract()[0])
author_id = author_url.split("/")[-1]
topic_chart.author = author_id
if tr.xpath("//td[4]//em/text()").extract():
create_time = datetime.strptime(tr.xpath("//td[4]//em/text()").extract()[0], "%Y-%m-%d %H:%M")
topic_chart.create_time = create_time
if tr.xpath("//td[5]//span/text()").extract():
answer_info = tr.xpath("//td[5]//span/text()").extract()[0]
answer_nums = answer_info.split("/")[0]
click_nums = answer_info.split("/")[1]
topic_chart.answer_nums = int(answer_nums)
topic_chart.click_nums = int(click_nums)
if tr.xpath("//td[6]//em/text()").extract():
last_reply_time = tr.xpath("//td[6]//em/text()").extract()[0]
last_time = datetime.strptime(last_reply_time, "%Y-%m-%d %H:%M")
topic_chart.last_answer_time = last_time
topic_chart.save()
写回答
1回答
-
JackyBreak
提问者
2020-02-14
解决了,看了下一小节视频就明白了。谢谢。
012020-02-15
相似问题