京东数据爬不到
来源:2-9 爬取京东网的数据

rannrann
2018-12-27
import requests
from lxml import html
def spider(sn):
"""爬取京东的图书数据"""
url = 'https://search.jd.com/Search?keyword={0}'.format(sn)
# html文档
html_doc = requests.get(url).text
print(html_doc)
if __name__ == '__main__':
spider('9787115428028')
老师你好,京东网的数据爬不到了。请问有什么解决办法么?
写回答
1回答
-
添加useragent请求头,代码参考:
import requests from lxml import html def spider(sn): """爬取京东的图书数据""" url = 'https://search.jd.com/Search?keyword={0}'.format(sn) #html文档 resp = requests.get(url, headers={ 'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.6776.400 QQBrowser/10.3.2601.400', }) print(resp.encoding) resp.encoding = 'utf8' # html_doc = requests.get(url).text html_doc = resp.text print(html_doc) #获取xpath对象 selector = html.fromstring(html_doc) #找到列表的集合 ul_list = selector.xpath('//div[@id="J_goodsList"]/ul/li') print(len(ul_list)) #解析对应的内容,标题,价格,购买链接 for li in ul_list: #标题 title = li.xpath('div/div[@class="p-name"]/a/@title') print(title) if __name__ == '__main__': spider('9787115428028')
132018-12-28
相似问题