How can I debug Scrapy?

Question

I'm 99% sure something is going on with my hxs.select on this website. I cannot extract anything. When I run the following code, I don't get any error feedback. title or link doesn't get populated. Any help?

def parse(self, response):
    self.log("


 We got data! 


")
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//div[@class=\'footer\']')
    items = []
    for site in sites:
        item = CarrierItem()
        item['title'] = site.select('.//a/text()').extract()
        item['link'] = site.select('.//a/@href').extract()
        items.append(item)
    return items

Is there a way I can debug this? I also tried to use the scrapy shell command with an url but when I input view(response) in the shell it simply returns True and a text file opens instead of my Web Browser.

>>> response.url
'https://qvpweb01.ciq.labs.att.com:8080/dis/login.jsp'

>>> hxs.select('//div')
Traceback (most recent call last):
    File "", line 1, in 
AttributeError: 'NoneType' object has no attribute 'select'

>>> view(response)
True

>>> hxs.select('//body')
Traceback (most recent call last):
    File "", line 1, in 
AttributeError: 'NoneType' object has no attribute 'select'

deostroll · Accepted Answer

You can use pdb from the command line and add a breakpoint in your file. But it might involve some steps.

(It may differ slightly for windows debugging)

Locate your scrapy executable:
```
$ whereis scrapy
/usr/local/bin/scrapy
```

Call it as python script and start pdb

$ python -m pdb /usr/local/bin/scrapy crawl quotes

Once in the debugger shell, open another shell instance and locate the path to your spider script (residing in your spider project)
```
$ realpath path/to/your/spider.py
/absolute/spider/file/path.py
```

This will output the absolute path. Copy it to your clipboard.

In the pdb shell type:

b /absolute/spider/file/path.py:line_number

...where line number is the desired point to break when debugging that file.

Hit c in the debugger...

Now go do some PythonFu :)

Z0B · Answer

Using VSCode:

1. Locate where your scrapy executable is:

$ which scrapy
/Users/whatever/tutorial/tutorial/env/bin/scrapy

For me it was at /Users/whatever/tutorial/tutorial/env/bin/scrapy, copy that path.

2. Create a launch.json file

Go to the debug tab in VSCode and click "Add configuration" enter image description here

3. Paste the following template into the launch.json

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "args": ["crawl", "NAME_OF_SPIDER"],
            "type": "python",
            "request": "launch",
            "program": "PATH_TO_SCRAPY_FILE",
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}

In that template replace NAME_OF_SPIDER with the name of your spider (in my case datasets). And PATH_TO_SCRAPY_FILE with the output which you got in step 1. (in my case /Users/whatever/tutorial/tutorial/env/bin/scrapy). enter image description here

How can I debug Scrapy?

Tags:

python

web-scraping

scrapy

Gio

2 Answers

deostroll

Using VSCode:

1. Locate where your scrapy executable is:

2. Create a launch.json file

3. Paste the following template into the launch.json

4. Check that VSCode was opened at the root of your scrapy project

5. Set a breakpoint and click debug!

Z0B

Recent Activity

Donate For Us

How can I debug Scrapy?

Tags:

python

web-scraping

scrapy

Gio

2 Answers

deostroll

Using VSCode:

1. Locate where your scrapy executable is:

2. Create a launch.json file

3. Paste the following template into the launch.json

4. Check that VSCode was opened at the root of your scrapy project

5. Set a breakpoint and click debug!

Z0B

Related questions

Recent Activity

Donate For Us