scrapy 如何带form表单进行post请求

来源:8-3 Requests和Response介绍

wa666

2017-09-24

class Lagouspider1Spider(scrapy.Spider):
    name = 'lagouspider1'
    allowed_domains = ['www.lagou.com']
    city = '深圳'
    keyword = 'python'

    start_urls = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city)

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
        'Accept-Encoding': 'gzip, deflate, br',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'X-Requested-With': 'XMLHttpRequest',
        'X-Anit-Forge-Token': 'None',
        'X-Anit-Forge-Code': '0',
        'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
        'Content-Length': '25',
        'Connection': 'keep-alive'
    }

    cookie = {
        'user_trace_token': '20170904131442-e5b6ef64-0911-440a-af99-d44c86945775',
        'Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505144480,1505264923,1505435823,1505461844',
        '_ga': 'GA1.2.1576896716.1504502078',
        'LGUID': '20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644',
        'index_location_city': '%E6%B7%B1%E5%9C%B3',
        '_gid': 'GA1.2.656103528.1505435823',
        'SEARCH_ID': 'c0b695cc832c46539e184e3d1bbcb749',
        'JSESSIONID': 'ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052',
        '_gat': '1',
        'Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505461844',
        'LGSID': '20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644',
        'PRE_UTM': '',
        'PRE_HOST': 'www.baidu.com',
        'PRE_SITE': 'https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667',
        'PRE_LAND': 'https%3A%2F%2Fwww.lagou.com%2F',
        'LGRID': '20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644',
        'TG-TRACK-CODE': 'index_search'
    }

    
    def parse(self, response):
        pass

    def start_requests(self):
        formdata = {'first': 'true', 'pn': '1', 'kd': self.keyword}
        yield scrapy.FormRequest(self.start_urls,headers=self.headers,formdata=formdata,callback=self.parse_detail,cookies=self.cookie)

在scrapy用FormRequest进行post请求数据服务器报400或者404

import requests

headers = {
            'User-Agent': 'ozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0',
            'Accept': 'application/json, text/javascript, */*; q=0.01',
            'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
            'Accept-Encoding': 'gzip, deflate, br',
            'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'X-Requested-With': 'XMLHttpRequest',
            'X-Anit-Forge-Token': 'None',
            'X-Anit-Forge-Code': '0',
            'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
            'Content-Length': '25',
            'Cookie': 'user_trace_token=20170904131442-e5b6ef64-0911-440a-af99-d44c86945775; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505144480,1505264923,1505435823,1505461844; _ga=GA1.2.1576896716.1504502078; LGUID=20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644; index_location_city=%E6%B7%B1%E5%9C%B3; _gid=GA1.2.656103528.1505435823; SEARCH_ID=c0b695cc832c46539e184e3d1bbcb749; JSESSIONID=ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505461844; LGSID=20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644; PRE_UTM=; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; LGRID=20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644; TG-TRACK-CODE=index_search',
            'Connection': 'keep-alive'
        }

keyword = 'python'
city = '深圳'

post_data = {
    'first':'true',
    'pn':'1',
    'kd':keyword
}

url = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city)

rp = requests.post(url,headers=headers,data=post_data)
print(rp.text)

同样的数据参数,改用requests进行post请求,可以正常返回json数据。

弄了一天了。。。不知道scrapy为啥就不能请求到,debug到FormRequest那行就不会继续走了。

scrapy下到底要怎么写才对呢。。查了一天都没弄对。。

写回答

1回答

wa666

提问者

2017-09-24

已解决   scrapy中的headers删掉'Content-Length': '25',就可以了

0
1
bobby
好的,
2017-09-25
共1条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5795 学习 · 6290 问题

查看课程