[scrapy] 2024-04-28 圈点952
摘要:scrapy学习之路(七)随机的user_agent,操作方法
scrapy学习之路(七)随机的user_agent,操作方法:
1,在middlewares.py中增加随机类:
import random
from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware
class CustomUserAgentMiddleware(object):
def __init__(self, user_agent=''):
self.user_agent = user_agent
def process_request(self, request, spider):
ua = random.choice(self.user_agent_list)
if ua:
# print(ua)
request.headers.setdefault('User-Agent', ua)
user_agent_list = [ \
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/504.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", \
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/506.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", \
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/526.3 (KHTML, like Gecko) Chrome/19.12.1021.1 Safari/526.3", \
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/528.4 (KHTML, like Gecko) Chrome/19.14.1031.1 Safari/528.4", ]
2,在setting中设置
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware' : None,
'dy.middlewares.CustomUserAgentMiddleware': 543,
}
scrapy学习之路(七)随机的user_agent,操作方法