For example i had a site <code>"www.example.com"</code> Actually i want to scrape the html of this site by saving on to local system. so for testing i saved that page on my desktop as <code>example.html</code> Now i had written the spider code for this as below <pre class="prettyprint"><code>class ExampleSpider(BaseSpider): name = "example" start_urls = ["example.html"] def parse(self, response): print response hxs = HtmlXPathSelector(response) </code></pre> But when i run the above code i am getting this error as below <pre class="prettyprint"><code>ValueError: Missing scheme in request url: example.html </code></pre> Finally my intension is to scrape the <code>example.html</code> file that consists of <code>www.example.com</code> html code saved in my local system Can any one suggest me on how to assign that example.html file in start_urls Thanks in advance

You can crawl a local file using an url of the following form: <pre class="prettyprint"><code> file:///path/to/file.html </code></pre>

scraping the file with html saved in local system

Tags:

python

scrapy

For example i had a site "www.example.com" Actually i want to scrape the html of this site by saving on to local system. so for testing i saved that page on my desktop as example.html

Now i had written the spider code for this as below

class ExampleSpider(BaseSpider):    name = "example"    start_urls = ["example.html"]     def parse(self, response):        print response        hxs = HtmlXPathSelector(response)

But when i run the above code i am getting this error as below

ValueError: Missing scheme in request url: example.html

Finally my intension is to scrape the example.html file that consists of www.example.com html code saved in my local system

Can any one suggest me on how to assign that example.html file in start_urls

Thanks in advance

887

asked Jun 05 '12 10:06

Shiva Krishna Bavandla

1 Answers

You can crawl a local file using an url of the following form:

 file:///path/to/file.html

109

answered Sep 23 '22 21:09

iodbh

Related questions
                            
                                Do Python lambda functions help in reducing the execution times?
                            
                                Renaming file extension using pathlib (python 3)
                            
                                Easiest way to serialize a simple class object with simplejson?
                            
                                how to convert Python 3 to Python 2 code? [closed]
                            
                                Unwanted RST TCP packet with Scapy
                            
                                Changing variable name in Spyder
                            
                                PermissionError: [WinError 32] The process cannot access the file because it is being used by another process
                            
                                numpy testing assert array NOT equal
                            
                                Where are Pip installation logs?
                            
                                Add class to Django label_tag() output
                            
                                copy.deepcopy vs pickle
                            
                                expanding (adding a row or column) a scipy.sparse matrix
                            
                                Alembic --autogenerate producing empty migration
                            
                                'is' operator behaves differently when comparing strings with spaces
                            
                                beautiful soup getting tag.id
                            
                                Index multiple, non-adjacent ranges in numpy
                            
                                Why does redefining a variable used in a generator give strange results? [duplicate]
                            
                                How to query a table, in sqlalchemy
                            
                                Python Curses Handling Window (Terminal) Resize
                            
                                Python: Create Dictionary from Text/File that's in Dictionary Format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With