Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting string without string.Split

Tags:

string

c#

I am doing a home work question to split a string without using the framework method.

Following is the working code I came up with.

I would like to know how can I improve the running time to O(n)?

Also any suggestions on improvement are welcome.

public static string[] split(string txt, char[] delim)
{
    char[] text = txt.ToCharArray();
    string[] result = new string[0];
    int count = 0;
    int i = 0;
    StringBuilder buff = new StringBuilder(); 
    while(i < text.Length)
    {
        bool found = false;
        foreach(char del in delim)
        {
            if(del == txt[i])
            {
                found = true;
                break;
            }
        }
        if(found)
        {
            count++;
            Array.Resize(ref result, count);
            result[count - 1] = buff.ToString();
            buff = new StringBuilder();                 
        }
        else
        {
            buff.Append(txt[i]);
        }

        i++;
    }

    if(buff.Length != 0)
    {
        count++;
        Array.Resize(ref result, count);
        result[count - 1] = buff.ToString();
    }

    return(result);
}
like image 932
Nemo Avatar asked Jan 30 '12 16:01

Nemo


2 Answers

I have a few changes that will simultaneously make this function more C like and reduce the run time to O(n):

1) Instead of dynamically resizing your result array lots of times, you should create it with enough spots to hold the maximum number of strings (you know that number is less than txt.Length) and then resize it just once at the very end before returning it.

2) Instead of assembling your results with a StringBuilder, make a char[] buff of length txt.Length and an index variable j and do buff[j++] = txt[i].

I think your function should be O(N) after you do that. Well technically it will be O(N*M) where M is the number of delimiters.

EDIT 1:

Here is a change that will make it be O(N)+O(M) instead of O(N*M):

Instead of looping through the delimiters for each character in the string, you should loop through the delimiters in ADVANCE and set up an array like this:

bool[] isDelimiter = new bool[128];  // increase size if you are allowing non-ascii
foreach(char delim in isDelimiter)
{
    isDelimiter[(int)char] = true;
}

Then you can just use this array to test each character of string in constant time.

like image 65
David Grayson Avatar answered Nov 12 '22 14:11

David Grayson


I think your professor is looking for an API that takes only a single character, not an array. Not an array of characters. What I mean by this is if your delimiting string is "abcd", you will not be splitting on all instances of 'a','b','c','d'. You will only split if you find the whole string.

Your current algorithm is not O(n) because for each element in the input array you are comparing it against each element of the delimiting array. This leads to an O(n*m) execution time.

I do not think it is possible to convert this down to O(n) because every element on the input needs to be compared to every element of the delimiter array. I think it's probably more likely your professor is asking a different question in regards to the delimiter array.

public static String[] Split(String input, String delimiter)
{
    List<String> parts = new List<String>();
    StringBuilder buff = new StringBuilder();
    if (delimiter.Length > 1) //you are splitting on a string not a character
    {
       //perform string searching algorithm here
    }
    else if(delimiter.Length == 0)
    {
       throw new InvalidOperationException("Invalid delimiter.");
    }
    else //you are splitting on a character
    {
       char delimChar = delimiter[0];
       for (int i = 0; i < input.Length; i++)
       {
           if (input[i] == delimChar)
           {
               parts.Add(buff.ToString());
               buff.Clear();
           }
           else
           {
               buff.Append(input[i]);
           }
       }
    }
    return parts.ToArray();
}

C#'s String.Split() does take in an array of delimiters, but I do not believe that it does the splitting in O(n) time.

If you are looking into String search algorithms, these may be of assistance. http://en.wikipedia.org/wiki/String_searching_algorithm

edit: I incorrectly alluded to that fact that C#'s String.Split() API did not take an array of delimiters.

like image 2
Localghost Avatar answered Nov 12 '22 12:11

Localghost