cnblogs模拟登录出错错误
来源:4-8 . cnblogs模拟登录(新增内容)
魈仔
2021-12-25
# -*- coding: utf-8 -*-
import multiprocessing
import scrapy
from scrapy import Selector
class JobboleSpider(scrapy.Spider):
name = 'jobbole'
allowed_domains = ['news.cnblogs.com']
start_urls = ['http://news.cnblogs.com/']
custom_settings = {
"COOKIES_ENABLED": True
}
def start_requests(self):
# 入口可以模拟登入拿到cookie,selenium控制浏览器会被一些网站识别出来例如知乎,拉勾
import undetected_chromedriver.v2 as uc
browser = uc.Chrome()
browser.get("https://account.cnblogs.com/signin")
print("_______________")
# 自动化输入,自动化识别滑动验证码并拖动
input("回车继续:")
# 拿到cookie
cookies = browser.get_cookies()
cookie_dict = {}
for cookie in cookies:
cookie_dict[cookie['name']] = cookie['value']
for url in self.start_urls:
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
# 将cookie交给scrapy, 后续的请求会请求之前的cooked吗?
yield scrapy.Request("https://news.cnblogs.com/n/709266/", cookies=cookie_dict, headers=headers,dont_filter=True)
def parse(self, response):
sel = Selector(text=response.text)
url1 = sel.css('#news_list h2 a::attr(href)').extract()
url = response.css('#news_list h2 a::attr(href)').extract()
pass

写回答
1回答
-
bobby
2021-12-27
你这里是一启动爬虫就报错还是运行到某处之后报错?
042021-12-31
相似问题