『scrapy爬虫』07. scrapy中使用代理（详细注释步骤）

发现你走远了

1113人浏览 · 2024-03-18 10:30:25

发现你走远了 · 2024-03-18 10:30:25 发布

start_requests中添加代理

pipelines.py通道中对应的通道的start_requests

    def start_requests(self) :
        for page in range(10): #10页
            yield Request(
                url=f'https://movie.douban.com/top250?start={page*25}&filter=',
                meta={'proxy':"socket5://127.0.0.1:1086"},#socket5代理
                # meta={'proxy':"http://127.0.0.1:1086"}#购买的商业代理一般是http给一个api接口
            )

中间件中添加代理

middlewares.py中的MyscrapyDownloaderMiddleware下面的process_request函数

class MyscrapyDownloaderMiddleware:
	#------省略各种函数--------
    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        request.meta={'proxy':"socket5://127.0.0.1:1086"}#在中间件中请求前拦截请求 添加代理
        return None