Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sanitize input before making a regex out of it?

I'm making a Java program that parses the user's input with a regex. For instance, if the user inputs /me eats, it should match the /me and replace it with <move>. However, Java isn't properly matching because / is a special character to regexes. How do I automatically replace all the various special Java regex characters with escapes?

For instance:

  • /me becomes \/me
  • * becomes \*
  • [ becomes \[
  • and so on...

before it's put into Pattern.compile.

This is not a command system. I am allowing users to specify how to denote a roleplaying move. If it helps, here is a mockup of how the user specifies what they consider a roleplay move:

A mockup of the preferences pane that would control this system

like image 707
Ky. Avatar asked May 17 '14 04:05

Ky.


1 Answers

Supuhstar, I believe this is the one-liner you're looking for (see online demo):

String sanitized = subject.replaceAll("[-.\\+*?\\[^\\]$(){}=!<>|:\\\\]", "\\\\$0");

This adds a backslash to all of the following characters:

. + * ? [ ^ ] $ ( ) { } = ! < > | : - \

Test Input: String subject = ".+*?[^]$(){}=!<>|:-\\";

Output: \.\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:-\\

Next, as you wanted, you can proceed with:

Pattern regex = Pattern.compile(sanitized);

Notes:

  1. Like Perl and PHP, Java also has a syntax to escape an entire string: you place it between \Q and \E. That is what Pattern.quote does for you, but it quotes more text than you want for your situation.

  2. This is only one possible solution answering your specific requirement of adding a backslash. For more options, also see Does using Pattern.LITERAL mean the same as Pattern.quote?

like image 111
zx81 Avatar answered Sep 28 '22 07:09

zx81