Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace all alphanumeric characters in a string except pattern

Tags:

c#

regex

I'm trying to obfuscate a string, but need to preserve a couple patterns. Basically, all alphanumeric characters need to be replaced with a single character (say 'X'), but the following (example) patterns need to be preserved (note that each pattern has a single space at the beginning)

  • QQQ"
  • RRR"

I've looked through a few samples on negative lookahead/behinds, but still not haven't any luck with this (only testing QQQ).

var test = @"""SOME TEXT       AB123 12XYZ QQQ""""empty""""empty""1A2BCDEF";
var regex = new Regex(@"((?!QQQ)(?<!\sQ{1,3}))[0-9a-zA-Z]");            
var result = regex.Replace(test, "X");  

The correct result should be:

"XXXX XXXX       XXXXX XXXXX QQQ""XXXXX""XXXXX"XXXXXXXX

This works for an exact match, but will fail with something like ' QQR"', which returns

"XXXX XXXX       XXXXX XXXXX XQR""XXXXX""XXXXX"XXXXXXXX
like image 502
Matt Avatar asked Dec 18 '13 19:12

Matt


People also ask

How do you remove everything except alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you replace non-alphanumeric characters?

We can remove non-alphanumeric characters from the string with preg_replace() function in PHP. The preg_replace() function is an inbuilt function in PHP which is used to perform a regular expression for search and replace the content.

How do you replace all alphanumeric characters with an empty string?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.


2 Answers

You can use this:

var regex = new Regex(@"((?> QQQ|[^A-Za-z0-9]+)*)[A-Za-z0-9]");            
var result = regex.Replace(test, "$1X");

The idea is to match all that must be preserved first and to put it in a capturing group.

Since the target characters are always preceded by zero or more things that must be preserved, you only need to write this capturing group before [A-Za-z0-9]

like image 164
Casimir et Hippolyte Avatar answered Nov 09 '22 23:11

Casimir et Hippolyte


Here's a non-regex solution. Works quite nice, althought it fails when there is one pattern in an input sequence more then once. It would need a better algorithm fetching occurances. You can compare it with a regex solution for a large strings.

public static string ReplaceWithPatterns(this string input, IEnumerable<string> patterns, char replacement)
{
    var patternsPositions = patterns.Select(p => 
           new { Pattern = p, Index = input.IndexOf(p) })
           .Where(i => i.Index > 0);

    var result = new string(replacement, input.Length);
    if (!patternsPositions.Any()) // no pattern in the input
        return result;

    foreach(var p in patternsPositions)
        result = result.Insert(p.Index, p.Pattern); // return patterns back

    return result;
}
like image 40
Ondrej Janacek Avatar answered Nov 09 '22 23:11

Ondrej Janacek