Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Scrapy show user agent per download request in log?

I am learning Scrapy, a web crawling framework.

I know I can set USER_AGENT in settings.py file of the Scrapy project. When I run the Scrapy, I can see the USER_AGENT's value in INFO logs.
This USER_AGENT gets set in every download request to the server I want to crawl.

But I am using multiple USER_AGENT randomly with the help of this solution. I guess this randomly chosen USER_AGENT would be working. I want to confirm it. So, how I can make Scrapy shows USER_AGENT per download request so I can see the value of USER_AGENT in the logs?

like image 538
Alok Avatar asked Apr 18 '14 10:04

Alok


People also ask

How do I use user agent in Scrapy?

Use set option to change the USER_AGENT value for your fetch request. Open the configuration file of your Scrapy project using your preferred text editor. Search for the USER_AGENT option. Uncomment the line and set the value to the user-agent of your choice to permanently set the user agent for your Scrapy spider.

How do I know my user agent?

That's where WhatIsMyBrowser.com steps in - we decode your user agent string to figure out everything it's saying. Check out our user agent analyser page, which gives you a neat breakdown of all the things we can tell you about your browser and computer based on your user agent.

What is user agent in GET request?

The User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent.


1 Answers

Just FYI.

I've implemented a simple RandomUserAgentMiddleware middleware based on fake-useragent.

Thanks to fake-useragent, you don't need to configure the list of User-Agents - it picks them up based on browser usage statistics from a real-world database.

like image 185
alecxe Avatar answered Oct 20 '22 22:10

alecxe