图片下载302问题
来源:4-25 有没有方法可以比较准确的解析出 title 和正文内容
慕粉13276915582
2021-12-15
老师你好,请教一下 我这边去下载新浪的图片,但是呢图片有个重定向,点击重定向会获取大图,
我用scrapy获取时候,会发生下面错误,搜索了好久 也没想到办法。
我也配置了
setting.py 的媒体重定向 MEDIA_ALLOW_REDIRECTS = True
也配置了允许的域名:allowed_domains = [‘weibo.cn’, ‘wx1.sinaimg.cn’]
不知道是哪里缺失步骤了
class PersonSpider(scrapy.Spider):
name = 'weibo_person'
allowed_domains = ['weibo.cn', 'wx1.sinaimg.cn']
settings = get_project_settings()
base_url = f'https://weibo.cn/u/{settings.get("USER_URI")}'
page = 2
@staticmethod
def comment_url(weibo_id):
weibo_id = weibo_id.replace('M_', '')
return f"https://weibo.cn/comment/{weibo_id}?ckAll=1"
def start_requests(self):
self.base_url = 'https://weibo.cn/comment/L6cjVzyed'
yield scrapy.Request(url=self.base_url,
callback=self.parse_long_weibo,
meta={
'base_url': self.base_url
})
def parse_long_weibo(self, response: HtmlResponse):
"""
获取长原创微博
:return:
"""
weibo_item = WeiboItem()
main_pic = response.xpath("//div/a[text() = '原图']/@href")
pic_url = main_pic.extract_first()
weibo_item['image_urls'] = [response.urljoin(pic_url)]
weibo_item['weibo_id'] = '123'
weibo_item['weibo'] = '123'
weibo_item['url'] = '123'
weibo_item['url_object_id'] = '123'
yield weibo_item
2021-12-15 23:36:13 [scrapy.core.engine] INFO: Spider opened
2021-12-15 23:36:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-15 23:36:13 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-12-15 23:36:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/comment/L6cjVzyed> (referer: None)
2021-12-15 23:36:32 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://wx1.sinaimg.cn/large/d7562ea8gy1gxetb4izz7j20n01dsaei.jpg> from <GET https://weibo.cn/mblog/oripic?&id=L6cjVzyed&u=d7562ea8gy1gxetb4izz7j20n01dsaei&rl=1>
2021-12-15 23:36:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://wx1.sinaimg.cn/large/d7562ea8gy1gxetb4izz7j20n01dsaei.jpg> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2021-12-15 23:36:40 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://wx1.sinaimg.cn/large/d7562ea8gy1gxetb4izz7j20n01dsaei.jpg> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
1回答
-
bobby
2021-12-21
这个图片需要登录才能抓取,你这里已经登录过了吗?
032021-12-24
相似问题