着急,着急,着急,爬取博客园时出现了302重定向
来源:3-4 正则表达式-3

慕的地5536528
2021-05-19
拿来做毕设的,马上要答辩了,突然给我整这么一出,开启博客园网站的爬虫后,可以正常显示页面,但是,爬取时被302重定向了,是不是大家爬取太多了,网站做了针对性的反爬。
开启爬虫后是可以正常访问博客园的
下面是运行时的信息,请老师给点办法,着实着急,谢谢老师
2021-05-19 17:25:36 [scrapy.core.engine] INFO: Spider opened
2021-05-19 17:25:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-05-19 17:25:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com> (referer: None)
2021-05-19 17:25:37 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://news.cnblogs.com/n/page/2/> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:41 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694199%2F> from <GET https://news.cnblogs.com/n/694199/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:43 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694200%2F> from <GET https://news.cnblogs.com/n/694200/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:46 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694201%2F> from <GET https://news.cnblogs.com/n/694201/>
2021-05-19 17:25:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694202%2F> from <GET https://news.cnblogs.com/n/694202/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
2021-05-19 17:25:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://account.cnblogs.com:443/signin?ReturnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F694203%2F> from <GET https://news.cnblogs.com/n/694203/>
<fake_useragent.fake.FakeUserAgent object at 0x000001CEE72B4D48> random
Process finished with exit code -1
很着急,希望老师及时回复下,给点解决办法,相信很多拿来做毕设的同学,也碰到了这样的问题,拜托老师
写回答
2回答
-
bobby
2021-05-20
可以先看看5-2使用selenium模拟登录后直接抓取
00 -
我没你笨
2021-05-20
后面有一部分数据是需要登录才能请求成功的,所以302了,你定位到第一个302的数据,点开试试
00
相似问题