Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy - logging to file and stdout simultaneously, with spider names

I've decided to use the Python logging module because the messages generated by Twisted on std error is too long, and I want to INFO level meaningful messages such as those generated by the StatsCollector to be written on a separate log file while maintaining the on screen messages.

 from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

Well, this is fine, I've got my messages, but the downside is that I do not know the messages are generated by which spider! This is my log file, with "twisted" being displayed by %(name)s:

 INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

As compared to the messages generated from twisted on standard error:

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

I've tried %(name)s, %(module)s amongst others but I don't seem to be able to show the spider name. Does anyone knows the answer?

EDIT: the problem with using LOG_FILE and LOG_LEVEL in settings is that the lower level messages will not be shown on std error.

like image 374
goh Avatar asked Dec 16 '11 09:12

goh


3 Answers

You want to use the ScrapyFileLogObserver.

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

I'm glad you asked this question, I've been wanting to do this myself.

like image 50
Acorn Avatar answered Nov 20 '22 11:11

Acorn


It is very easy to redirect output using: scrapy some-scrapy's-args 2>&1 | tee -a logname

This way, all what scrapy ouputs into stdout and stderr, will be redirected to a logname file and also, prited to the screen.

like image 18
Alexander Artemenko Avatar answered Nov 20 '22 09:11

Alexander Artemenko


For all those folks who came here before reading the current documentation version:

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    filemode = 'a',
    format='%(levelname)s: %(message)s',
    level=logging.DEBUG
)
like image 9
Alex K. Avatar answered Nov 20 '22 10:11

Alex K.