I'm looking for the fastest/most efficient way to search if a given username is available from a set of tens of millions of usernames. At the moment I'm using a normal MySQL SELECT query that runs every key press, but I'm not happy with the performance. I'm using indexing, partitioning, etc and I know that MySQL can be optimized to be very fast, but I also know that there are better solutions.
So what's the fastest username search:
Ex: how does Gmail search across billions of email addresses when registering. How does Facebook do it? I assume they don't just run an SQL query.
I'm looking for a practical solution for a PHP app.
Right now I'm just using a very basic select:
SELECT username FROM users WHERE username = $username LIMIT 1
The username column has a unique index on it
I agree you should try and stick it all in RAM (e.g. Redis).
But if you don't want to go the whole way, I do the following: store the list somewhere slow (e.g. S3 or a SQL database). Next, make a Bloom filter (there stuff on wikipedia on that, and there's a nifty Redis module that you can use - https://oss.redislabs.com/redisbloom) from that list.
Now, BF tells will never give you a false negative so you can efficiently check with it whether a username is available. Sometimes, however, the BF will report a username as unavailable (false positive) and you have decide if you can live with that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With