I am having trouble with logging in scrapy, and most of what I can find is out of date.
I have set LOG_FILE="log.txt"
in the settings.py
file and from the documentation, this should work:
Scrapy provides a logger within each Spider instance, that can be accessed and used like this:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['http://scrapinghub.com']
def parse(self, response):
self.logger.info('Parse function called on %s', response.url)
But when I do:
class MySpider(CrawlSpider):
#other code
def parse_page(self,response):
self.logger.info("foobar")
I get nothing. If I set
logger = logging.basicConfig(filename="log.txt",level=logging.INFO)
At the top of my file, after my imports, it creates a log file, and the default output gets logged just fine, but
class MySpider(CrawlSpider):
#other code
def parse_page(self,response):
logger.info("foobar")
Fails to make an appearance. I have also tried putting it in the class __init__
, as such:
def __init__(self, *a, **kw):
super(FanfictionSpider, self).__init__(*a, **kw)
logging.basicConfig(filename="log.txt",level=logging.INFO)
I once again get no output to the file, just to the console, and foobar
does not show up. Can someone please direct me on how to correctly log in Scrapy?
For logging I just put this on the spider class:
import logging
from scrapy.utils.log import configure_logging
class SomeSpider(scrapy.Spider):
configure_logging(install_root_handler=False)
logging.basicConfig(
filename='log.txt',
format='%(levelname)s: %(message)s',
level=logging.INFO
)
This will put all scrapy output into the project root directory as a log.txt
file
If you want to log something manually you shouldn't use the scrapy logger, it's deprecated. Just use the python one
import logging
logging.error("Some error")
It seems that you're not calling your parse_page
method at any time.
Try to commenting your parse
method and you're going to receive a NotImplementedError
because you're starting it and you're saying it 'do nothing'.
Maybe if you implement your parse_page
method it'll work
def parse(self, response):
self.logger.info('Parse function called on %s', response.url)
self.parse_page(response)
Hope it helps you.
I was unable to make @Rafael Almeda's solution work until I added the following to the import section of my spider.py code:
from scrapy.utils.log import configure_logging
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With