Im new in web crawling. I'm going to build a search engine which the crawler saves Rapidshare links including URL where that Rapidshare links found...
In other words, I'm going to build a website similar to filestube.com
After some searching, I've found Scrapy works with Django. I've tried to find about nutch integration with Django, but found nothing
I hope you can give me suggestion for building this kind of website... especially the crawler
Django is a great choice for just about any web development project. It's particularly good for social media sites or e-commerce sites that require a strong and secure foundation because the Django framework has built-in features that are great for protecting sensitive data, transactions and user authentication.
Create a file urls.py in the engine folder. Append the following lines. Our project is now done , to fire it up type python3 manage.py runserver enter this url in your browser and you should see this. Now enter your query in the search bar and your should get your results like this.
Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing.
The best known pluggable app for that is Django-Haystack which allows you to connect to several search backends :
haystack allows you to use an API which looks like Django's own Queryset syntax to use directly these search engines (which all happens to have their own API and dialects).
If you're juste after scraping tools, whatever tool you'll use : BeautifulSoup or Scrappy, you'll be on your own, writing python code that will parse what you want to parse, and then populate your django models.
This can even be separate python scripts , available in the commands.py module.
If you have a lot of files to search, you will probably need an index, which is rebuilt frequently and allows fast searches without hitting the django ORM.
Using a Solr index (for example) enables you to create other fields on-the-fly, like virtual fields based on your real model's fields (ex : splitting author firstname and lastname, adding an uppercased file title field, whatever)
Of course, f you don't need speedy indexation, keyword boost or semantic analysis, you still can do a classic full-text search over a couple of django model fields i :
Have you checked DjangoItem? It's an experimental Scrapy feature, but it's known to work
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With