[scrapy] 2025-04-27 圈点770
摘要:scrapy中parse多次给callback传参数。如上,可以灵活运用return request多次进行传递参数与循环抓取内容。
scrapy中parse多次给callback传参数,先来个示例:
def parse(self, response):
item = MyItem()
item['urla'] = response.url
request = scrapy.Request("http://www.xoxxoo.com/article/show/i/277.html",
callback=self.parse_a)
request.meta['item'] = item
return request
def parse_a(self, response):
item = response.meta['item']
item['urlb'] = response.url
return item
如上,可以灵活运用return request多次进行传递参数与循环抓取内容。
理论知识解析:
Scrapy请求对象参数:
1. url(string)
2. callback (函数)
3. method (string)默认为GET
4. meta(dict)Request.meta属性的初始值
5. headers (dict)
6. cookies (dict or list)
7. encoding (string)
8. priority (int) (scheduler安排的优先级,默认都是0)
9. dont_filter (boolean)
10. errback (callable)
Scrapy Response对象参数:
1. url (string)
2. headers(dict)
3. status(integer) 例如200, 404等
4. body(str)
5. meta(dict)
6. flags(list) 例如cached’, ‘redirected‘