Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing Scrapy/Splash user agent

How can I set the user agent for Scrapy with Splash in an equivalent way like below:

import requests
from bs4 import BeautifulSoup

ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")

My spider would look similar to this:

import scrapy
from scrapy_splash import SplashRequest


class ExampleSpider(scrapy.Spider):
        name = "example"
        allowed_domains = ["example.com"]
        start_urls = ["https://www.example.com/"]

        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(
                    url,
                    self.parse,
                    args={'wait': 0.5}
                )
like image 933
zinyosrim Avatar asked Sep 04 '17 17:09

zinyosrim


2 Answers

You need to set user_agent attribute to override default user agent:

class ExampleSpider(scrapy.Spider):
    name = 'example'
    user_agent = 'Mozilla/5.0'

In this case UserAgentMiddleware (which is enabled by default) will override USER_AGENT setting value to 'Mozilla/5.0'.

You can also override headers per request:

scrapy_splash.SplashRequest(url, headers={'User-Agent': custom_user_agent})
like image 185
skovorodkin Avatar answered Oct 13 '22 19:10

skovorodkin


The proper way is to to alter the splash script to included it... no add it to the spider though, if it works as well.

enter image description here

http://splash.readthedocs.io/en/stable/scripting-ref.html?highlight=agent

like image 25
scriptso Avatar answered Oct 13 '22 20:10

scriptso