遍历Selecterlist里使用xpath的问题

来源：4-14 items的定义和使用 - 1

lazymyth

2021-01-04

import scrapy
from urllib import parse


class CnblogsSpider(scrapy.Spider):
    name = 'cnblogs'
    allowed_domains = ['news.cnblogs.com']
    start_urls = ['http://news.cnblogs.com/']

    def parse(self, response):
        post_nodes = response.xpath("//div[@class='news_block']")
        for post_node in post_nodes:
            image_url = post_node.xpath("//img[@class='topic_img']/@src").extract()
            '''
            这里用这个xpath语法会把这个列表的封面图都获取到，是为啥呢，不是应该只对遍历到的当前selecter生效吗？
            烦请老师解答下^v^
            '''
            pass

写回答

1回答

bobby

2021-01-06

已采纳

这是xapath的语法，这种语法有点变化你应该将image_url = post_node.xpath("//img[@class='topic_img']/@src").extract() 改成 image_url = post_node.xpath(".//img[@class='topic_img']/@src").extract() 最前面需要加一个点号

慕勒5311868

bobby

加点代表取当前元素为根节点向下查找元素，而不加点是以整个页面为根元素向下查找的

2022-01-25

共3条回复

Scrapy打造搜索引擎畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy，用Django+Elasticsearch搭建搜索引擎

5818 学习 · 6291 问题

查看课程

相似问题

老师您好，我遍历SelectorList的时候如果使用xpath总会提取到当前页的第一条新闻，即使对SelectorList切片之后。这是怎么回事呢？

回答 4