scrapy框架如何下载 MP3 等文件

来源：12-1 新建scrapy项目

两努

2020-02-23

如果是普通的代码需要下载 MP3 文件，
可以用下面的函数，
只用 get 方法拿到网页返回的.content即可。

def spider_word_mp3(word):
    import requests
    path = os.path.exists("./sounds/words")
    url = "https://www.gstatic.com/dictionary/static/sounds/oxford/{}--_us_1.mp3".format(word))

    if path:
        r = requests.get(url)
        if "<!DOCTYPE html>" in r.text:
            print("爬不到MP3" + word)
        else:
            with open('./sounds/words/' + word + '.mp3', 'wb') as f:
                f.write(r.content)
                print("成功下载" + word)
    else:
        print("没有这个文件夹")
        os.makedirs('./sounds/words/', exist_ok=True)
        spider_word_mp3(word)

但是，如果用 scrapy 框架，需要怎么下载？
scrapy 似乎又更好的方法
Files Pipeline
但是怎么用？怎么设置参数？不懂。求指导。
整个课程都是讲拿到网页的文本内容，关于文件的下载，不管是下载音频、图片，都没有讲，下载音频和图片应该也算是非常基础的知识。希望老师可以补充一下。

求解答怎么用scrapy框架如何下载 MP3 等文件。能不能写一个案例看一下。看看 items.py 文件和 pipelines.py 要怎么写，到底是在 items.py 中写保存呢，还是在 pipelines.py里写保存？要怎么写？

写回答

3回答

bobby

2020-02-25

scrapy自带了file,image,media三个Pipeline
以文件下载为例子:
1.在pipelines模块中导入FilesPipeline: from scrapy.pipelines.files import FilesPipeline
2.在配置文件中启用:ITEM_PIPELINE={'scrapy.pipelines.files.FilesPipeline':1}
3.在配置文件配置下载目录 FILES_STORE='/user/somepath'
4.在items模块中定义两个关键字段:

class ExamplesItem(scrapy.Item):
file_urls=scrapy.Field()
files=scrapy.Field()

5.在siper中,只需要把要下载的URL添加给Item实例即可:

def parse(self,response):
download_url=response.css('a::attr(href)').extract_first()#解析得到要下载的url
item=ExamplesItem()
item['file_urls']=[download_url]

yield item