scrapy主动停止爬取的几种方法

[scrapy] 2025-10-18 圈点515

摘要：scrapy主动停止爬取的几种方法：scrapy.exceptions.CloseSpider(reason='cancelled')，Scrapy的CloseSpider扩展会在满足条件时自动终止爬虫程序。可以设置CLOSESPIDER_TIMEOUT（秒）、CLOSESPIDER_ITEMCOUNT、CLOSESPIDER_PAGECOUNT、CLOSESPIDER_ERRORCOUNT

scrapy主动停止爬取的几种方法：

使用异常退出spider

#scrapy.exceptions.CloseSpider(reason='cancelled')

from scrapy.exceptions import CloseSpider

这个异常可以从蜘蛛回调中提出，以请求蜘蛛关闭/停止。

参数： reason（str） - 关闭的原因

例如：

def parse_page （self ， response ）：

if 1：

raise CloseSpider （'close it' ）

-特点：

发送此信号的时候，实际上还有一些在列队中的url，需要完成才会停止。

用ctrl c也一样。

Scrapy的CloseSpider扩展会在满足条件时自动终止爬虫程序。

可以设置CLOSESPIDER_TIMEOUT（秒）、CLOSESPIDER_ITEMCOUNT、CLOSESPIDER_PAGECOUNT、CLOSESPIDER_ERRORCOUNT分别代表在指定时间过后、在抓取了指定数目的Item之后、在收到了指定数目的响应之后、在发生了指定数目的错误之后就终止爬虫程序。通常情况下可以在命令行中设置：

$ scrapy crawl fast -s CLOSESPIDER_ITEMCOUNT=10

$ scrapy crawl fast -s CLOSESPIDER_PAGECOUNT=10

$ scrapy crawl fast -s CLOSESPIDER_TIMEOUT=10

scrapy 主动停止

相关内容: