I am trying to populate start_url with a SELECT from a MYSQL table using spider.py. When i run "scrapy runspider spider.py" i get no output, just that it finished with no error.
I have tested the SELECT query in a python script and start_url get populated with the entrys from the MYSQL table.
spider.py
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb
class ProductsSpider(BaseSpider):
name = "Products"
allowed_domains = ["test.com"]
start_urls = []
def parse(self, response):
print self.start_urls
def populate_start_urls(self, url):
conn = MySQLdb.connect(
user='user',
passwd='password',
db='scrapy',
host='localhost',
charset="utf8",
use_unicode=True
)
cursor = conn.cursor()
cursor.execute(
'SELECT url FROM links;'
)
rows = cursor.fetchall()
for row in rows:
start_urls.append(row[0])
conn.close()
A better approach is to override the start_requests method.
This can query your database, much like populate_start_urls
, and return a sequence of Request objects.
You would just need to rename your populate_start_urls
method to start_requests
and modify the following lines:
for row in rows:
yield self.make_requests_from_url(row[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With