Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying MongoDB (via pymongo) in case insensitive efficiently

I'm currently creating a website in python (pyramid) which requires users to sign up and log in. The system allows for users to choose a username which can be a mixture of capital letters, lowercase letters, and numbers.

The problem arises when making sure that two users don't accidentally share the same username, i.e. in my system 'randomUser' should be the same as 'RandomUser' or 'randomuser'.

Unfortunately (in this case) because Mongo stores strings as case sensitive, there could potentially be a number of users with the 'same' username.

I am aware of the method of querying mongo for case insensitive strings:

db.stuff.find_one({"foo": /bar/i});

However, this does not seem to work in my query method using pymongo:

username = '/' + str(username) + '/i'
response = request.db['user'].find_one({"username":username},{"username":1})

Is this the correct way of structuring the query for pymongo (I'm assuming not)?

This query will be used whenever a user account is created or logged in to (as it has to check if the username exists in the system). I know it's not the most efficient query, so should it matter if it's only used on log ins or account creation? Is it more desirable to instead do something like forcing users to choose only lowercase usernames (negating the need for case-insensitive queries altogether)?

like image 423
johneth Avatar asked Jun 07 '11 14:06

johneth


People also ask

How do I make a case-insensitive query in MongoDB?

The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.

Is MongoDB data case-sensitive?

As mentioned by @natac13 and @007_jb mongo shell is an interactive javascript interpreter and hence it is also case-sensitive.


2 Answers

PyMongo uses native python regular expressions, in the same way as the mongo shell uses native javascript regular expressions. To write the equivalent query of what you had written in the shell above, you would use:

db.stuff.find_one({'name': re.compile(username, re.IGNORECASE)})

Note that this will avoid using any index that may exist on the name field, however. A common pattern for case-insensitive searching or sorting is to have a second field in your document, for instance name_lower, which is always set whenever name changes (to a lower-cased version of name, in this case). You would then query for such a document like:

db.stuff.find_one({'name_lower': username.lower()})
like image 51
dcrosta Avatar answered Oct 20 '22 00:10

dcrosta


Accepted answer is dangerous, it will match any string containing the username! Safe option is to match the exact string:

import re
db.stuff.find_one({'name': re.compile('^' + username + '$', re.IGNORECASE)})

Even safer, escape the variable of any special characters which might affect the regex match:

import re
db.stuff.find_one({'name': re.compile('^' + re.escape(username) + '$', re.IGNORECASE)}) 
like image 9
Mikko Avatar answered Oct 20 '22 02:10

Mikko