I know how to do this if I iterate through all of the characters in the string but I am looking for a more elegant method.
Use the test() method on the following regular expression to check if a string contains only letters and numbers - /^[A-Za-z0-9]*$/ . The test method will return true if the regular expression is matched in the string and false otherwise.
isalnum() is a built-in Python function that checks whether all characters in a string are alphanumeric. In other words, isalnum() checks whether a string contains only letters or numbers or both. If all characters are alphanumeric, isalnum() returns the value True ; otherwise, the method returns the value False .
To check if a string contains any letter, use the test() method with the following regular expression /[a-zA-Z]/ . The test method will return true if the string contains at least one letter and false otherwise. Copied! We used the RegExp.
Method #1 : Using all() + isspace() + isalpha() This is one of the way in which this task can be performed. In this, we compare the string for all elements being alphabets or space only.
A regular expression will do the trick with very little code:
import re ... if re.match("^[A-Za-z0-9_-]*$", my_little_string): # do something here
[Edit] There's another solution not mentioned yet, and it seems to outperform the others given so far in most cases.
Use string.translate to replace all valid characters in the string, and see if we have any invalid ones left over. This is pretty fast as it uses the underlying C function to do the work, with very little python bytecode involved.
Obviously performance isn't everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath, but just to see how the solutions stack up, here's a performance comparison of all the methods proposed so far. check_trans is the one using the string.translate method.
Test code:
import string, re, timeit pat = re.compile('[\w-]*$') pat_inv = re.compile ('[^\w-]') allowed_chars=string.ascii_letters + string.digits + '_-' allowed_set = set(allowed_chars) trans_table = string.maketrans('','') def check_set_diff(s): return not set(s) - allowed_set def check_set_all(s): return all(x in allowed_set for x in s) def check_set_subset(s): return set(s).issubset(allowed_set) def check_re_match(s): return pat.match(s) def check_re_inverse(s): # Search for non-matching character. return not pat_inv.search(s) def check_trans(s): return not s.translate(trans_table,allowed_chars) test_long_almost_valid='a_very_long_string_that_is_mostly_valid_except_for_last_char'*99 + '!' test_long_valid='a_very_long_string_that_is_completely_valid_' * 99 test_short_valid='short_valid_string' test_short_invalid='/$%$%&' test_long_invalid='/$%$%&' * 99 test_empty='' def main(): funcs = sorted(f for f in globals() if f.startswith('check_')) tests = sorted(f for f in globals() if f.startswith('test_')) for test in tests: print "Test %-15s (length = %d):" % (test, len(globals()[test])) for func in funcs: print " %-20s : %.3f" % (func, timeit.Timer('%s(%s)' % (func, test), 'from __main__ import pat,allowed_set,%s' % ','.join(funcs+tests)).timeit(10000)) print if __name__=='__main__': main()
The results on my system are:
Test test_empty (length = 0): check_re_inverse : 0.042 check_re_match : 0.030 check_set_all : 0.027 check_set_diff : 0.029 check_set_subset : 0.029 check_trans : 0.014 Test test_long_almost_valid (length = 5941): check_re_inverse : 2.690 check_re_match : 3.037 check_set_all : 18.860 check_set_diff : 2.905 check_set_subset : 2.903 check_trans : 0.182 Test test_long_invalid (length = 594): check_re_inverse : 0.017 check_re_match : 0.015 check_set_all : 0.044 check_set_diff : 0.311 check_set_subset : 0.308 check_trans : 0.034 Test test_long_valid (length = 4356): check_re_inverse : 1.890 check_re_match : 1.010 check_set_all : 14.411 check_set_diff : 2.101 check_set_subset : 2.333 check_trans : 0.140 Test test_short_invalid (length = 6): check_re_inverse : 0.017 check_re_match : 0.019 check_set_all : 0.044 check_set_diff : 0.032 check_set_subset : 0.037 check_trans : 0.015 Test test_short_valid (length = 18): check_re_inverse : 0.125 check_re_match : 0.066 check_set_all : 0.104 check_set_diff : 0.051 check_set_subset : 0.046 check_trans : 0.017
The translate approach seems best in most cases, dramatically so with long valid strings, but is beaten out by regexes in test_long_invalid (Presumably because the regex can bail out immediately, but translate always has to scan the whole string). The set approaches are usually worst, beating regexes only for the empty string case.
Using all(x in allowed_set for x in s) performs well if it bails out early, but can be bad if it has to iterate through every character. isSubSet and set difference are comparable, and are consistently proportional to the length of the string regardless of the data.
There's a similar difference between the regex methods matching all valid characters and searching for invalid characters. Matching performs a little better when checking for a long, but fully valid string, but worse for invalid characters near the end of the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With