For my link scraping program (written in python3.3) I want to use a database to store around 100.000 websites:
I don't have knowledge about databases, but found the following may fit my purpose:
I'm interested in speed (to access the database and to get the wanted information). For example: for website x does property y exist and if yes read it. The speed of writing is of course also important.
My question: Are there big differences in speed or does it not matter for my small program? Maybe someone can tell which database fits my requirements (and is easy to handle with Python).
The size and scale of your database is not particularly large, and it's well within the scope of almost any off-the-shelf database solution.
Basically, what you're going to do is install the database server on your machine and it will come up on a given port. You then can install a library in Python to access it.
For example, if you want to use Postgresql, you'll install it on your machine and it will come up attached to some port like 5000, or port 5432.
But if you just have the information you're talking about to store and retrieve, you probably want to go with a NoSQL solution because it's very easy.
For example, you can install mongodb on your server, then install pymongo. The tutorial for pymongo will teach you pretty much everything you need for your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With