请问这个站的数据如何取?

来源:9-1 selenium动态网页请求与模拟登录知乎

begin_0002

2021-11-01

post地址:http://www.jijiyouxuan.com/index.php?s=/index/search/goodlistnew.html
xhr可以看到数据
POST后出现500错误
2021-11-01 23:25:05 [scrapy.core.engine] DEBUG: Crawled (500) <POST http://www.jijiyouxuan.com/index.php?s=/index/search/goodlistnew.html> (referer: None)
2021-11-01 23:25:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 http://www.jijiyouxuan.com/index.php?s=/index/search/goodlistnew.html>: HTTP status code is not handled or not allowed
def start_requests(self):
browser = uc.Chrome()
browser.get(“http://www.jijiyouxuan.com/”)
input(“回车继续:”)
cookie = browser.get_cookies()
cookie_dict = {}
for cook in cookie:
cookie_dict[cook[“name”]] = cook[“value”]

    print(cookie_dict)
    data = {
        'category_id':'1221',
        'brand_id': '0',
        'manner_id': '0',
        'material_id': '0',
        'size_id': '0',
        'other_id': '0',
        'bed_id': '0',
        'sofa_id': '0',
        'chandi_id': '0',
        'thickness_id': '0',
        'price1': '0',
        'price2': '0',
        'tags': '0',
        'wd': '',
        'page': '1',
        'order_by_field': 'default',
        'order_by_type': 'asc'
    }
    headers={
      'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
    }
    for url in self.start_urls:
        yield scrapy.FormRequest(url=url, formdata=data,cookies=cookie_dict, headers=headers, callback=self.parse)


def parse(self, response):

    pass
写回答

1回答

bobby

2021-11-03

使用 undetected chromedriver也不行?

0
0

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5795 学习 · 6290 问题

查看课程