Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I verify that a string only contains letters, numbers, underscores and dashes?

I know how to do this if I iterate through all of the characters in the string but I am looking for a more elegant method.

like image 567
Ethan Post Avatar asked Sep 18 '08 04:09

Ethan Post


People also ask

How do you know if a string is only letters and numbers?

Use the test() method on the following regular expression to check if a string contains only letters and numbers - /^[A-Za-z0-9]*$/ . The test method will return true if the regular expression is matched in the string and false otherwise.

How do you check if a string only contains letters and numbers in Python?

isalnum() is a built-in Python function that checks whether all characters in a string are alphanumeric. In other words, isalnum() checks whether a string contains only letters or numbers or both. If all characters are alphanumeric, isalnum() returns the value True ; otherwise, the method returns the value False .

How do you check if a string contains any letters?

To check if a string contains any letter, use the test() method with the following regular expression /[a-zA-Z]/ . The test method will return true if the string contains at least one letter and false otherwise. Copied! We used the RegExp.

How do you check if a string contains only alphabets and spaces in Python?

Method #1 : Using all() + isspace() + isalpha() This is one of the way in which this task can be performed. In this, we compare the string for all elements being alphabets or space only.


2 Answers

A regular expression will do the trick with very little code:

import re  ...  if re.match("^[A-Za-z0-9_-]*$", my_little_string):     # do something here 
like image 177
Thomas Avatar answered Sep 28 '22 06:09

Thomas


[Edit] There's another solution not mentioned yet, and it seems to outperform the others given so far in most cases.

Use string.translate to replace all valid characters in the string, and see if we have any invalid ones left over. This is pretty fast as it uses the underlying C function to do the work, with very little python bytecode involved.

Obviously performance isn't everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath, but just to see how the solutions stack up, here's a performance comparison of all the methods proposed so far. check_trans is the one using the string.translate method.

Test code:

import string, re, timeit  pat = re.compile('[\w-]*$') pat_inv = re.compile ('[^\w-]') allowed_chars=string.ascii_letters + string.digits + '_-' allowed_set = set(allowed_chars) trans_table = string.maketrans('','')  def check_set_diff(s):     return not set(s) - allowed_set  def check_set_all(s):     return all(x in allowed_set for x in s)  def check_set_subset(s):     return set(s).issubset(allowed_set)  def check_re_match(s):     return pat.match(s)  def check_re_inverse(s): # Search for non-matching character.     return not pat_inv.search(s)  def check_trans(s):     return not s.translate(trans_table,allowed_chars)  test_long_almost_valid='a_very_long_string_that_is_mostly_valid_except_for_last_char'*99 + '!' test_long_valid='a_very_long_string_that_is_completely_valid_' * 99 test_short_valid='short_valid_string' test_short_invalid='/$%$%&' test_long_invalid='/$%$%&' * 99 test_empty=''  def main():     funcs = sorted(f for f in globals() if f.startswith('check_'))     tests = sorted(f for f in globals() if f.startswith('test_'))     for test in tests:         print "Test %-15s (length = %d):" % (test, len(globals()[test]))         for func in funcs:             print "  %-20s : %.3f" % (func,                     timeit.Timer('%s(%s)' % (func, test), 'from __main__ import pat,allowed_set,%s' % ','.join(funcs+tests)).timeit(10000))         print  if __name__=='__main__': main() 

The results on my system are:

Test test_empty      (length = 0):   check_re_inverse     : 0.042   check_re_match       : 0.030   check_set_all        : 0.027   check_set_diff       : 0.029   check_set_subset     : 0.029   check_trans          : 0.014  Test test_long_almost_valid (length = 5941):   check_re_inverse     : 2.690   check_re_match       : 3.037   check_set_all        : 18.860   check_set_diff       : 2.905   check_set_subset     : 2.903   check_trans          : 0.182  Test test_long_invalid (length = 594):   check_re_inverse     : 0.017   check_re_match       : 0.015   check_set_all        : 0.044   check_set_diff       : 0.311   check_set_subset     : 0.308   check_trans          : 0.034  Test test_long_valid (length = 4356):   check_re_inverse     : 1.890   check_re_match       : 1.010   check_set_all        : 14.411   check_set_diff       : 2.101   check_set_subset     : 2.333   check_trans          : 0.140  Test test_short_invalid (length = 6):   check_re_inverse     : 0.017   check_re_match       : 0.019   check_set_all        : 0.044   check_set_diff       : 0.032   check_set_subset     : 0.037   check_trans          : 0.015  Test test_short_valid (length = 18):   check_re_inverse     : 0.125   check_re_match       : 0.066   check_set_all        : 0.104   check_set_diff       : 0.051   check_set_subset     : 0.046   check_trans          : 0.017 

The translate approach seems best in most cases, dramatically so with long valid strings, but is beaten out by regexes in test_long_invalid (Presumably because the regex can bail out immediately, but translate always has to scan the whole string). The set approaches are usually worst, beating regexes only for the empty string case.

Using all(x in allowed_set for x in s) performs well if it bails out early, but can be bad if it has to iterate through every character. isSubSet and set difference are comparable, and are consistently proportional to the length of the string regardless of the data.

There's a similar difference between the regex methods matching all valid characters and searching for invalid characters. Matching performs a little better when checking for a long, but fully valid string, but worse for invalid characters near the end of the string.

like image 20
9 revs Avatar answered Sep 28 '22 04:09

9 revs