着急,着急,着急,爬取博客园时出现了302重定向

来源:3-4 正则表达式-3

慕的地5536528

2021-05-19

拿来做毕设的,马上要答辩了,突然给我整这么一出,开启博客园网站的爬虫后,可以正常显示页面,但是,爬取时被302重定向了,是不是大家爬取太多了,网站做了针对性的反爬。

开启爬虫后是可以正常访问博客园的
图片描述
下面是运行时的信息,请老师给点办法,着实着急,谢谢老师

2021-05-19 17:25:36 [scrapy.core.engine] INFO: Spider opened
2021-05-19 17:25:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-05-19 17:25:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com> (referer: None)
2021-05-19 17:25:37 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://news.cnblogs.com/n/page/2/> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:41 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694199%2F> from <GET https://news.cnblogs.com/n/694199/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:43 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694200%2F> from <GET https://news.cnblogs.com/n/694200/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:46 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694201%2F> from <GET https://news.cnblogs.com/n/694201/>
2021-05-19 17:25:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694202%2F> from <GET https://news.cnblogs.com/n/694202/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694203%2F> from <GET https://news.cnblogs.com/n/694203/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random

Process finished with exit code -1

很着急,希望老师及时回复下,给点解决办法,相信很多拿来做毕设的同学,也碰到了这样的问题,拜托老师

写回答

2回答

bobby

2021-05-20

可以先看看5-2使用selenium模拟登录后直接抓取

0
0

我没你笨

2021-05-20

后面有一部分数据是需要登录才能请求成功的,所以302了,你定位到第一个302的数据,点开试试

0
0

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5825 学习 · 6292 问题

查看课程