I want to create or find an open source web crawler (spider/bot) written in Python. It must find and follow links, collect meta tags and meta descriptions, title's of web pages and the url of a webpage and put all of the data into a MySQL database.
Does anyone know of any open source scripts that could help me? Also, if anyone can give me some pointers as to what I should do then they are more than welcome to.
yes i know,
libraries
https://github.com/djay/transmogrify.webcrawler
http://code.google.com/p/harvestman-crawler/
http://code.activestate.com/pypm/orchid/
open source web crawler
http://scrapy.org/
tutorials
http://www.example-code.com/python/pythonspider.asp
PS I don't know if they use mysql because normally python either uses sqlit or postgre sql so if you want you could use the libraries i gave you and import the python-mysql module and do it :D
http://sourceforge.net/projects/mysql-python/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With