Scrapy Unit Testing

Question

I'd like to implement some unit tests in a Scrapy (screen scraper/web crawler). Since a project is run through the "scrapy crawl" command I can run it through something like nose. Since scrapy is built on top of twisted can I use its unit testing framework Trial? If so, how? Otherwise I'd like to get nose working.

Update:

I've been talking on Scrapy-Users and I guess I am supposed to "build the Response in the test code, and then call the method with the response and assert that [I] get the expected items/requests in the output". I can't seem to get this to work though.

I can build a unit-test test class and in a test:

create a response object
try to call the parse method of my spider with the response object

However it ends up generating this traceback. Any insight as to why?

Sam Stoelinga · Accepted Answer

The way I've done it is create fake responses, this way you can test the parse function offline. But you get the real situation by using real HTML.

A problem with this approach is that your local HTML file may not reflect the latest state online. So if the HTML changes online you may have a big bug, but your test cases will still pass. So it may not be the best way to test this way.

My current workflow is, whenever there is an error I will sent an email to admin, with the url. Then for that specific error I create a html file with the content which is causing the error. Then I create a unittest for it.

This is the code I use to create sample Scrapy http responses for testing from an local html file:

# scrapyproject/tests/responses/__init__.py  import os  from scrapy.http import Response, Request  def fake_response_from_file(file_name, url=None):     """     Create a Scrapy fake HTTP response from a HTML file     @param file_name: The relative filename from the responses directory,                       but absolute paths are also accepted.     @param url: The URL of the response.     returns: A scrapy HTTP response which can be used for unittesting.     """     if not url:         url = 'http://www.example.com'      request = Request(url=url)     if not file_name[0] == '/':         responses_dir = os.path.dirname(os.path.realpath(__file__))         file_path = os.path.join(responses_dir, file_name)     else:         file_path = file_name      file_content = open(file_path, 'r').read()      response = Response(url=url,         request=request,         body=file_content)     response.encoding = 'utf-8'     return response

The sample html file is located in scrapyproject/tests/responses/osdir/sample.html

Then the testcase could look as follows: The test case location is scrapyproject/tests/test_osdir.py

import unittest from scrapyproject.spiders import osdir_spider from responses import fake_response_from_file  class OsdirSpiderTest(unittest.TestCase):      def setUp(self):         self.spider = osdir_spider.DirectorySpider()      def _test_item_results(self, results, expected_length):         count = 0         permalinks = set()         for item in results:             self.assertIsNotNone(item['content'])             self.assertIsNotNone(item['title'])         self.assertEqual(count, expected_length)      def test_parse(self):         results = self.spider.parse(fake_response_from_file('osdir/sample.html'))         self._test_item_results(results, 10)

That's basically how I test my parsing methods, but its not only for parsing methods. If it gets more complex I suggest looking at Mox

Scrapy Unit Testing

Tags:

python

unit-testing

scrapy

nose

ciferkey

1 Answers

Sam Stoelinga

Recent Activity

Donate For Us

Scrapy Unit Testing

Tags:

python

unit-testing

scrapy

nose

ciferkey

1 Answers

Sam Stoelinga

Related questions

Recent Activity

Donate For Us