获取到 ul_list 的 长度是30, 但是 循环 ul_list 打印出来的是空数组?
来源:2-9 爬取京东网的数据
野生前端菜鸟
2018-07-07
from lxml import html
import requests
def spider_jd(sn):
url = 'https://search.jd.com/Search?keyword={0}'.format(sn)
res = requests.get(url)
res.encoding = 'utf-8'
html_data = res.text
selector = html.fromstring(html_data)
# 找到书单的列表
ul_list = selector.xpath('//div[@id="J_goodsList"]/ul/li')
print(len(ul_list))
for li in ul_list:
# title
title = selector.xpath('div/div[@class="p-name"]/a/@title')
print(title)
if __name__ == '__main__':
sn = 9787115428028
spider_jd(sn)结果是打印出来的 title 全是空数组
30
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
写回答
1回答
-
注意你的第22行
title = selector.xpath('div/div[@class="p-name"]/a/@title')是从整个文档开始查找,记住,我们要“先抓大,再抓小”,找到了每一项,就要从每一项里面再去匹配。所以应该是从循环得到的li元素进行查找。代码如下:
title = li.xpath('div/div[@class="p-name"]/a/@title')00
相似问题