Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to extract ONLY alphanumeric words

I am looking for a regex to extract the word that ONLY contain alphanumeic characters:

string = 'This is a $dollar sign !!'
matches = re.findall(regex, string)
matches = ['This', 'is', 'sign']

This can be done by tokenizing the string and evaluate each token individually using the following regex:

^[a-zA-Z0-9]+$

Due to performance issues, I want to able to extract the alphanumeric tokens without tokenizing the whole string. The closest I got to was

regex = \b[a-zA-Z0-9]+\b

, but it still extracts substrings containing alphanumeric characters:

string = 'This is a $dollar sign !!'
matches = re.findall(regex, string)
matches = ['This', 'is', 'dollar', 'sign']

Is there a regex able to pull this off? I've tried different things but can't come up with a solution.

like image 311
GRoutar Avatar asked Jan 05 '19 22:01

GRoutar


People also ask

How do I check if a string is alphanumeric regex?

For checking if a string consists only of alphanumerics using module regular expression or regex, we can call the re. match(regex, string) using the regex: "^[a-zA-Z0-9]+$". re. match returns an object, to check if it exists or not, we need to convert it to a boolean using bool().

How do I allow only letters and numbers in regex?

You can use regular expressions to achieve this task. In order to verify that the string only contains letters, numbers, underscores and dashes, we can use the following regex: "^[A-Za-z0-9_-]*$".


2 Answers

Instead of word boundaries, lookbehind and lookahead for spaces (or the beginning/end of the string):

(?:^|(?<= ))[a-zA-Z0-9]+(?= |$)

https://regex101.com/r/TZ7q1c/1

Note that "a" is a standalone alphanumeric word, so it's included too.

['This', 'is', 'a', 'sign']
like image 108
CertainPerformance Avatar answered Sep 24 '22 23:09

CertainPerformance


There is no need to use regexs for this, python has a built in isalnum string method. See below:

string = 'This is a $dollar sign !!'

matches = [word for word in string.split(' ') if word.isalnum()]
like image 23
hegash Avatar answered Sep 22 '22 23:09

hegash