Changing Scrapy/Splash user agent

Question

How can I set the user agent for Scrapy with Splash in an equivalent way like below:

import requests
from bs4 import BeautifulSoup

ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")

My spider would look similar to this:

import scrapy
from scrapy_splash import SplashRequest


class ExampleSpider(scrapy.Spider):
        name = "example"
        allowed_domains = ["example.com"]
        start_urls = ["https://www.example.com/"]

        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(
                    url,
                    self.parse,
                    args={'wait': 0.5}
                )

skovorodkin · Accepted Answer

You need to set user_agent attribute to override default user agent:

class ExampleSpider(scrapy.Spider):
    name = 'example'
    user_agent = 'Mozilla/5.0'

In this case UserAgentMiddleware (which is enabled by default) will override USER_AGENT setting value to 'Mozilla/5.0'.

You can also override headers per request:

scrapy_splash.SplashRequest(url, headers={'User-Agent': custom_user_agent})

scriptso · Answer

The proper way is to to alter the splash script to included it... no add it to the spider though, if it works as well.

enter image description here

http://splash.readthedocs.io/en/stable/scripting-ref.html?highlight=agent

Changing Scrapy/Splash user agent

Tags:

python-3.x

splash-screen

web-scraping

scrapy

zinyosrim

2 Answers

skovorodkin

scriptso

Recent Activity

Donate For Us

Changing Scrapy/Splash user agent

Tags:

python-3.x

splash-screen

web-scraping

scrapy

zinyosrim

2 Answers

skovorodkin

scriptso

Related questions

Recent Activity

Donate For Us