I have a list of about 120 thousand english words (basically every word in the language).
I need a regular expression that would allow searching through these words using wildcards characters, a.k.a. * and ?.
A few examples:
m?st*, it would match for example master or mister or mistery.*ind (any word ending in ind), it would match wind or bind or blind or grind.Now, most users (especially the ones who are not familiar with regular expressions) know that ? is a replacement for exactly 1 character, while * is a replacement for 0, 1 or more characters. I absolutely want to build my search feature based on this.
My questions is: How do I convert what the user types (m?st* for example) to a regular expression ?
I searched the web (obviously including this website) and all I could find were tutorials that tried to teach me too much or questions that were somewhat similar, but not enough as to provide an answer to my own problem.
All I could figure out was that I have to replace ? with .. So m?st* becomes m.st*. However, I have no idea what to replace * with.
Any help would be greatly appreciated. Thank you.
PS: I'm totally new to regular expressions. I know how powerful they can be, but I also know they can be very hard to learn. So I just never took the time do to it...
Unless you want some funny behaviour, I would recommend you use \w instead of .
. matches whitespace and other non-word symbols, which you might not want it to do.
So I would replace ? with \w and replace * with \w*
Also if you want * to match at least one character, replace it with \w+ instead. This would mean that ben* would match bend and bending but not ben - it's up to you, just depends what your requirements are.
Take a look at this library: https://github.com/alenon/JWildcard
It wraps all not wildcard specific parts by regex quotes, so no special chars processing needed: This wildcard:
"mywil?card*"   will be converted to this regex string:
"\Qmywil\E.\Qcard\E.*"   If you wish to convert wildcard to regex string use:
JWildcard.wildcardToRegex("mywil?card*");   If you wish to check the matching directly you can use this:
JWildcard.matches("mywild*", "mywildcard");   Default wildcard rules are "?" -> ".", "" -> ".", but you can change the default behaviour if you wish, by simply defining the new rules.
JWildcard.wildcardToRegex(wildcard, rules, strict);   You can use sources or download it directly using maven or gradle from Bintray JCenter: https://bintray.com/yevdo/jwildcard/jwildcard
Gradle way:
compile 'com.yevdo:jwildcard:1.4'   Maven way:
<dependency>   <groupId>com.yevdo</groupId>   <artifactId>jwildcard</artifactId>   <version>1.4</version> </dependency> 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With