Scrapy read list of URLs from file to scrape?

Question

I've just installed scrapy and followed their simple dmoz tutorial which works. I just looked up basic file handling for python and tried to get the crawler to read a list of URL's from a file but got some errors. This is probably wrong but I gave it a shot. Would someone please show me an example of reading a list of URL's into scrapy? Thanks in advance.

from scrapy.spider import BaseSpider

class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    f = open("urls.txt")
    start_urls = f

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

Brian Cain · Accepted Answer

You were pretty close.

f = open("urls.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()

...better still would be to use the context manager to ensure the file's closed as expected:

with open("urls.txt", "rt") as f:
    start_urls = [url.strip() for url in f.readlines()]

Scrapy read list of URLs from file to scrape?

Tags:

python

scrapy

Anagio

1 Answers

Brian Cain

Recent Activity

Donate For Us

Scrapy read list of URLs from file to scrape?

Tags:

python

scrapy

Anagio

1 Answers

Brian Cain

Related questions

Recent Activity

Donate For Us