scrapy allow all domains

Tags:

python

scrapy

I saw this post to make scrapy crawl any site without allowed domains restriction.

Is there any better way of doing it, such as using a regular expression in allowed domains variable, like-

allowed_domains = ["*"]

I hope there is some other way than hacking into scrapy framework to do this.

849

asked Mar 03 '12 03:03

hrishikeshp19

2 Answers

Don't set allowed_domains at all.

Look at the get_host_regex() function in this scrapy file:

https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spidermiddleware/offsite.py

133

answered Oct 04 '22 22:10

Shawn Lewis

you should diactivate offsite middlware which is a built in spider middleware in scrapy. for more information http://doc.scrapy.org/en/latest/topics/spider-middleware.html

answered Oct 04 '22 22:10

Jhon Garside

Related questions
                            
                                In python, when you pass internally defined functions into other functions, how does it keep the variables?
                            
                                python variable method name
                            
                                putting glade interface in python
                            
                                Calculating user, nice, sys, idle, iowait, irq and sirq from /proc/stat
                            
                                'str' object has no attribute '__dict__'
                            
                                cPickle - different results pickling the same object
                            
                                How to do atomic file replacement? [duplicate]
                            
                                AttributeError: type object ... has no attribute 'objects'
                            
                                PyQt4 MouseMove event without MousePress
                            
                                Default values in a ctypes Structure
                            
                                Does Django password_reset support html email templates?
                            
                                Finding unique points in numpy array
                            
                                Can wtforms custom validator make a field optional?
                            
                                MAC address generator in python
                            
                                Point and figure chart with matplotlib
                            
                                Python - PyQt - QTable Widget - adding rows
                            
                                Can a Python generator be easily saved and reloaded from disk?
                            
                                Is there a Python module to parse line break notation in a raw string? [duplicate]
                            
                                Python equivalent of Perl/Ruby ||= [duplicate]
                            
                                Django remote user authentication and security

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With