Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i obtain a domain name with Scrapy?

I know there is a command in html : var x = document.domain; that gets the domain but how can i implement this in Scrapy so i can obtain domain names ?

like image 483
Prometheus Avatar asked Aug 13 '15 17:08

Prometheus


People also ask

How do you extract data from Scrapy?

While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .

How does Scrapy Python work?

Scrapy provides Item pipelines that allow you to write functions in your spider that can process your data such as validating data, removing data and saving data to a database. It provides spider Contracts to test your spiders and allows you to create generic and deep crawlers as well.


1 Answers

You can extract the domain name from the response.url:

from urlparse import urlparse

def parse(self, response):
    parsed_uri = urlparse(response.url)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
    print domain
like image 69
alecxe Avatar answered Sep 28 '22 18:09

alecxe