Scrapy 執行多個爬蟲spider檔案

1. 在專案資料夾中新建乙個commands資料夾

2. 在command的資料夾中新建乙個檔案 crawlall.py

3.在crawlall.py 中寫乙個command類，該類繼承 scrapy.commands

from scrapy.commands import scrapycommand
class command(scrapycommand):
requires_project = true
def syntax(self):
return '[options]'
def short_desc(self):
return 'runs all of the spiders 執行多個爬蟲檔案'
def run(self, args, opts):
spider_list = self.crawler_process.spiders.list()
print('*'*100)
print(spider_list)
print('*'*100)
for name in spider_list:
self.crawler_process.crawl(name, **opts.__dict__)
self.crawler_process.start()

到這裡還沒完，settings.py配置檔案還需要加一條。

commands_module = 『專案名稱.目錄名稱』

commands_module = 'news_spider.commands'

命令列執行:啟動所有爬蟲 scrapy crawlall

Scrapy 執行多個爬蟲

本文所使用的 scrapy 版本 scrapy 1.8.0 多個爬蟲所有爬蟲顯然，這兩種情況並不一定是等同的。假設當前專案下有 3 個爬蟲，分別名為 route dining experience，並在專案目錄下建立乙個main.py檔案，下面的示例都寫在這個檔案中，專案執行時，在命令列下執行...

scrapy 執行同個專案多個爬蟲

一開始我們預設都是只有乙個爬蟲的,所以執行的都是在專案下建立乙個py檔案 from scrapy import cmdline cmdline.execute scrapy crawl 爬蟲名 split 但是要執行多個爬蟲就犯難了,在這裡我只是做個筆記加強記憶原部落格其中執行以下 1 在sp...

Scrapy框架啟動多個爬蟲的方法

有的時候在抓取過程中可能會出現同乙個相同資料在不同url裡有不同爬取方法的情況，所以這個時候需要編寫多個爬蟲，最開始是使用cmdline.execute scrapy crawl spider1 split 啟動爬蟲，但發現用這種方法執行多個最後真正抓取的只有第二個。from scrapy imp...

Scrapy 執行多個爬蟲spider檔案

Scrapy 執行多個爬蟲

scrapy 執行同個專案多個爬蟲

Scrapy框架啟動多個爬蟲的方法

相關推薦