Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suggestion for building search engine using Django

Im new in web crawling. I'm going to build a search engine which the crawler saves Rapidshare links including URL where that Rapidshare links found...

In other words, I'm going to build a website similar to filestube.com

After some searching, I've found Scrapy works with Django. I've tried to find about nutch integration with Django, but found nothing

I hope you can give me suggestion for building this kind of website... especially the crawler

like image 545
Mbak Kunti Avatar asked Jan 07 '11 15:01

Mbak Kunti


People also ask

Is Django good for making websites?

Django is a great choice for just about any web development project. It's particularly good for social media sites or e-commerce sites that require a strong and secure foundation because the Django framework has built-in features that are great for protecting sensitive data, transactions and user authentication.

How do you make a search engine like Google in Python?

Create a file urls.py in the engine folder. Append the following lines. Our project is now done , to fire it up type python3 manage.py runserver enter this url in your browser and you should see this. Now enter your query in the search bar and your should get your results like this.

What is Django Q?

Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing.


2 Answers

The best known pluggable app for that is Django-Haystack which allows you to connect to several search backends :

  • Solr / Lucene the buzzword-compliant Apache foundation project
  • Whoosh a native python search library
  • Xapian another very good semantic search engine

haystack allows you to use an API which looks like Django's own Queryset syntax to use directly these search engines (which all happens to have their own API and dialects).

If you're juste after scraping tools, whatever tool you'll use : BeautifulSoup or Scrappy, you'll be on your own, writing python code that will parse what you want to parse, and then populate your django models.
This can even be separate python scripts , available in the commands.py module.

If you have a lot of files to search, you will probably need an index, which is rebuilt frequently and allows fast searches without hitting the django ORM.
Using a Solr index (for example) enables you to create other fields on-the-fly, like virtual fields based on your real model's fields (ex : splitting author firstname and lastname, adding an uppercased file title field, whatever)

Of course, f you don't need speedy indexation, keyword boost or semantic analysis, you still can do a classic full-text search over a couple of django model fields i :

  • Django native QuerySet see the "__search('something')" field lookup
  • PostGreSQL-specific full text search with Django
like image 192
Dominique Guardiola Avatar answered Oct 04 '22 19:10

Dominique Guardiola


Have you checked DjangoItem? It's an experimental Scrapy feature, but it's known to work

like image 40
Pablo Hoffman Avatar answered Oct 04 '22 18:10

Pablo Hoffman