Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.

`ABCDE`  // does not contain repeats 
`AABCD`  // does contain repeats, ie A is repeated

I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?

EDIT:

The string will be sorted, so order is not important so ABCDA => AABCD

The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.

like image 851
Ian G Avatar asked May 06 '09 13:05

Ian G


3 Answers

If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.

Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)

Bonus: here's a slick LINQ solution to implement Jon's functionality:

int longestRun =
    s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();

So, OK, it's not very fast! You got a problem with that?!

:-)

like image 108
mqp Avatar answered Nov 10 '22 00:11

mqp


If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.

EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:

public static bool IsSortedNoRepeats(string text)
{
    if (text.Length == 0)
    {
        return true;
    }
    char current = text[0];
    for (int i=1; i < text.Length; i++)
    {
        char next = text[i];
        if (next <= current)
        {
            return false;
        }
        current = next;
    }
    return true;
}

A shorter alternative if you don't mind repeating the indexer use:

public static bool IsSortedNoRepeats(string text)
{
    for (int i=1; i < text.Length; i++)
    {
        if (text[i] <= text[i-1])
        {
            return false;
        }
    }
    return true;
}

EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.

First a string-specific version:

public static int LongestRun(string text)
{
    if (text.Length == 0)
    {
        return 0;
    }
    char current = text[0];
    int currentRun = 1;
    int bestRun = 0;

    for (int i=1; i < text.Length; i++)
    {
        if (current != text[i])
        {
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = text[i];
        }
        currentRun++;
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Now we can also do this as a general extension method on IEnumerable<T>:

public static int LongestRun(this IEnumerable<T> source)
{
    bool first = true;
    T current = default(T);
    int currentRun = 0;
    int bestRun = 0;

    foreach (T element in source)
    {
        if (first || !EqualityComparer<T>.Default(element, current))
        {
            first = false;
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = element;
        }
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Then you can call "AABCD".LongestRun() for example.

like image 20
Jon Skeet Avatar answered Nov 09 '22 22:11

Jon Skeet


This will tell you very quickly if a string contains duplicates:

bool containsDups = "ABCDEA".Length != s.Distinct().Count();

It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...

Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.

like image 44
BenAlabaster Avatar answered Nov 10 '22 00:11

BenAlabaster