登入验证后 response.url有问题
来源:4-24 大规模抓取图片下载出错的问题
714400952
2022-07-08
def start_requests(self):
headers={
“user-agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”
};
browser = uc.Chrome();
browser.get(‘https://account.cnblogs.com/signin’);
input(“回车继续:”);
cookies = browser.get_cookies();
cookie_dict = {};
for cookie in cookies:
cookie_dict[cookie[‘name’]] = cookie[‘value’];
for url in self.start_urls:
yield Request(url,headers=headers,dont_filter=True);
def parse(self, response):
post_nodes=response.xpath('//div[@id="news_list"]/div[@class="news_block"]')[:1];
print(post_nodes);
for post_node in post_nodes:
image_url=post_node.xpath('//div[@class="entry_summary"]/a/img/@src').extract_first("");
post_url=post_node.css('h2 a::attr(href)').extract_first("");
yield Request(url=parse.urljoin("http://news.cnblogs.com/",post_url),meta={"from_image_url":image_url},callback=self.parse_detail)
def parse_detail(self,response):
match_re=re.match(".*?(\d+)",response.url); 这边response.url报错!
写回答
1回答
-
bobby
2022-07-10
这里导向到登录页面了。这个页面需要登录才能访问,是否没有使用cookie?
022022-07-13
相似问题