I am learning Scrapy, a web crawling framework.
I know I can set USER_AGENT
in settings.py
file of the Scrapy project. When I run the Scrapy, I can see the USER_AGENT
's value in INFO
logs.
This USER_AGENT
gets set in every download request to the server I want to crawl.
But I am using multiple USER_AGENT
randomly with the help of this solution. I guess this randomly chosen USER_AGENT
would be working. I want to confirm it. So, how I can make Scrapy shows USER_AGENT
per download request so I can see the value of USER_AGENT
in the logs?
Use set option to change the USER_AGENT value for your fetch request. Open the configuration file of your Scrapy project using your preferred text editor. Search for the USER_AGENT option. Uncomment the line and set the value to the user-agent of your choice to permanently set the user agent for your Scrapy spider.
That's where WhatIsMyBrowser.com steps in - we decode your user agent string to figure out everything it's saying. Check out our user agent analyser page, which gives you a neat breakdown of all the things we can tell you about your browser and computer based on your user agent.
The User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent.
Just FYI.
I've implemented a simple RandomUserAgentMiddleware
middleware based on fake-useragent
.
Thanks to fake-useragent
, you don't need to configure the list of User-Agents - it picks them up based on browser usage statistics from a real-world database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With