远程主机强迫关闭了一个现有的连接

来源:4-10 编写spider完成抓取过程 - 2

麦兜兜里豆不逗

2024-06-27

2024-06-27 22:27:17 [scrapy.utils.log] INFO: Versions: lxml 5.2.2.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.3.0, Python 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.1.0 (OpenSSL 3.2.2 4 Jun 2024), cryptography 42.0.8, Platform Windows-10-10.0.19045-SP0
2024-06-27 22:27:18 [scrapy.addons] INFO: Enabled addons:
[]
2024-06-27 22:27:18 [asyncio] DEBUG: Using selector: SelectSelector
2024-06-27 22:27:18 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-06-27 22:27:18 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop
2024-06-27 22:27:18 [scrapy.extensions.telnet] INFO: Telnet Password: dd632e1526ffd883
2024-06-27 22:27:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2024-06-27 22:27:18 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'ArticleSpider',
 'FEED_EXPORT_ENCODING': 'utf-8',
 'NEWSPIDER_MODULE': 'ArticleSpider.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['ArticleSpider.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2024-06-27 22:27:19 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-06-27 22:27:19 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-06-27 22:27:19 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-06-27 22:27:19 [scrapy.core.engine] INFO: Spider opened
2024-06-27 22:27:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-06-27 22:27:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-06-27 22:27:19 [undetected_chromedriver.patcher] DEBUG: getting release number from /last-known-good-versions-with-downloads.json
2024-06-27 22:27:19 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
  File "G:\Python\Python311\Lib\urllib\request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "G:\Python\Python311\Lib\http\client.py", line 1298, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "G:\Python\Python311\Lib\http\client.py", line 1344, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "G:\Python\Python311\Lib\http\client.py", line 1293, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "G:\Python\Python311\Lib\http\client.py", line 1052, in _send_output
    self.send(msg)
  File "G:\Python\Python311\Lib\http\client.py", line 990, in send
    self.connect()
  File "G:\Python\Python311\Lib\http\client.py", line 1470, in connect
    self.sock = self._context.wrap_socket(self.sock,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\ssl.py", line 1104, in _create
    self.do_handshake()
  File "G:\Python\Python311\Lib\ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Envs\article_spider\Lib\site-packages\scrapy\core\engine.py", line 182, in _next_request
    request = next(self.slot.start_requests)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Learn\Python\ArticleSpider\ArticleSpider\spiders\cnblogs.py", line 17, in start_requests
    browser = uc.Chrome()
              ^^^^^^^^^^^
  File "E:\Envs\article_spider\Lib\site-packages\undetected_chromedriver\__init__.py", line 258, in __init__
    self.patcher.auto()
  File "E:\Envs\article_spider\Lib\site-packages\undetected_chromedriver\patcher.py", line 175, in auto
    release = self.fetch_release_number()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Envs\article_spider\Lib\site-packages\undetected_chromedriver\patcher.py", line 250, in fetch_release_number
    with urlopen(self.url_repo + path) as conn:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Python\Python311\Lib\urllib\request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10054] 远程主机强迫关闭了一个现有的连接。>
2024-06-27 22:27:19 [scrapy.core.engine] INFO: Closing spider (finished)
2024-06-27 22:27:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.288842,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2024, 6, 27, 14, 27, 19, 448388, tzinfo=datetime.timezone.utc),
 'log_count/DEBUG': 4,
 'log_count/ERROR': 1,
 'log_count/INFO': 10,
 'start_time': datetime.datetime(2024, 6, 27, 14, 27, 19, 159546, tzinfo=datetime.timezone.utc)}
2024-06-27 22:27:19 [scrapy.core.engine] INFO: Spider closed (finished)

进程已结束,退出代码为 0

昨天还可以运行,今天就报错了,这是被封了吗,有什么解决方法

写回答

1回答

麦兜兜里豆不逗

提问者

2024-06-29

解决了。。。

重新指定了drive路径,不进行下载

browser = uc.Chrome(driver_executable_path="路径")


0
1
bobby
好的。。。
2024-07-05
共1条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5796 学习 · 6290 问题

查看课程