Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing nonnumeric and nonalpha characters from a string?

What is the best way to remove all the special characters from a string - like these:

!@#$%^&*(){}|:"?><,./;'[]\=-

The items having these characters removed would rather short, so would it be better to use REGEX on each or just use string manipulation?

Thx

Environment == C#/.NET

like image 926
Adron Avatar asked Feb 09 '09 14:02

Adron


People also ask

How can you remove all non-alphanumeric characters from a string?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.

How do you remove non-alphanumeric characters?

Non-alphanumeric characters can be remove by using preg_replace() function. This function perform regular expression search and replace. The function preg_replace() searches for string specified by pattern and replaces pattern with replacement if found.

How do you remove all non-alphanumeric characters from a string excel?

Select the range that you need to remove non-alphanumeric characters from, and click Kutools > Text > Remove Characters. 2. Then a Delete Characters dialog box will appear, only check Non-alphanumeric option, and click the Ok button. Now all of the non-alphanumeric characters have been deleted from the text strings.


2 Answers

It's generally better to have a whitelist than a blacklist.

Regex has a convenient \w that, effectively means alphanumeric plus underscore (some variants also add accented chars (á,é,ô,etc) to the list, others don't).

You can invert that by using \W to mean everything that's not alphanumeric.

So replace \W with empty string will remove all 'special' characters.


Alternatively, if you do need a different set of characters to alphanumeric, you can use a negated character class: [^abc] will match everything that is not a or b or c, and [^a-z] will match everything that is not in the range a,b,c,d...x,y,z

The equivalent to \w is [A-Za-z0-9_] and thus \W is [^A-Za-z0-9_]

like image 136
Peter Boughton Avatar answered Oct 09 '22 05:10

Peter Boughton


in php:

$tests = array(
     'hello, world!'
    ,'this is a test'
    ,'and so is this'
    ,'another test with /slashes/ & (parenthesis)'
    ,'l3375p34k stinks'
);

function strip_non_alphanumerics( $subject )
{
    return preg_replace( '/[^a-z0-9]/i', '', $subject );
}

foreach( $tests as $test )
{
    printf( "%s\n", strip_non_alphanumerics( $test ) );
}

output would be:

helloworld
thisisatest
andsoisthis
anothertestwithslashesparenthesis
l3375p34kstinks
like image 33
Kris Avatar answered Oct 09 '22 03:10

Kris