xpath获取img_url和post_url错误
来源:4-9 编写spider完成抓取过程 - 1

GoGo闯1
2019-11-05
代码:
# -*- coding: utf-8 -*-
import scrapy
from scrapy import Selector
class JobboleSpider(scrapy.Spider):
name = 'jobbole'
allowed_domains = ['news.cnblogs.com']
start_urls = ['http://news.cnblogs.com/']
def parse(self, response):
# extract_first 提取list中第一个元素,若为空list,则返回默认值
#url = response.xpath('//*[@id="entry_647068"]/div[2]/h2/a/@href').extract_first("")
post_notes = response.xpath('//*[@id="news_list"]/div[@class="news_block"]')
for post_note in post_notes:
print ("="*60)
img_url = post_note.xpath('//div[@class="entry_summary"]/a/img/@src').extract_first("")
post_url = post_note.xpath('//h2[@class="news_entry"]/a/@href').extract_first("")
print (post_note)
print (img_url)
print (post_url)
执行结果:
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647122'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647121'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647120'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647119'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647118'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647117'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647116'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647115'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647114'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647113'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647112'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647111'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647110'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647109'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647107'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647108'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647106'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647105'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647104'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647103'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647102'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647101'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647100'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647099'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647098'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647096'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647097'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647095'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647094'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
============================================================
<Selector xpath='//*[@id="news_list"]/div[@class="news_block"]' data='<div class="news_block" id="entry_647093'>
//images0.cnblogs.com/news_topic/小米.gif
/n/647122/
不知道为毛,for循环中post_note是不同的,但提取img_url和post_url的值都是一样的
写回答
1回答
-
bobby
2019-11-06
确实有点奇怪,你试试使用css选择器看看是否还有这个问题?
00
相似问题