This is a college project:
I have a database ( mysql or postgresql doesn't matter ) with 100 billion of posts and I need to search ( as fast as possible ) a generic keyword.
Every post is 500-1000 keywords.
This isn't only a database issue but also a software ( for indexing or other ) issue.
How can I do that ?
I could use some advanced search engine technology but I don't know which.
Have you considered using Apache Lucene ?
That does not directly work directly with your SQL database, you'll have to write some code to feed the documents into it in order to build and index which you then can query.
I don't know however how much additional space that would require and how much time it will take.
They will index it for you free and you will make money.
Well, there are about 6.8 billion people on the planet, who can read about 1 post per minute (on average).
If everyone contributes, 100 billion divided by 6.8 billion is 14.7 minutes to have all posts read.
So:
1) Conquer Earth.
2) Make everyone your slave.
3) Have posts read.
4) ???
5) Profit!
You might want to check out Sphinx. It's a full text search engine that handles distributed indexes. You can have parts of the data spread over many computers. And querying a single server can send the query to the other servers, and gather the results from each. It has pretty good speed, but you probably can't do 100 billion posts on a single machine.
You probably won't be able to do something like this in MySQL or Postgresql. While you could store all the data, MySQL and Postgres lack the full text indexing and search speed that a real full text index will bring you.
MySQL has a way to compile in support for the Sphinx storage engine, which, although the data will still be stored in Sphinx separate from MySQL, you can still query the Sphinx search engine using anything that connects to MySQL, and also perform joins to other tables that are in your MySQL database. However, if you just want to perform simple searches on the documents, and don't require joining to other data, you can just use the native PHP interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With