Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split string preserving whole words?

I need to split long sentence into parts preserving whole words. Each part should have given maximum number of characters (including space, dots etc.). For example:

int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."

Output:

1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."
like image 589
jlp Avatar asked Dec 09 '10 12:12

jlp


2 Answers

It seems like everyone is using some form of "Split then rebuild the sentence"...

I thought I would take a stab at this the way my brain would logically think about doing this manually, which is:

  • Split on length
  • Go backwards to the nearest space and use that chunk
  • Remove the used chunk and start over

The code ended up being a little more complex than I was hoping for, however I believe it handles most (all?) edge cases - including words that are longer than maxLength, when the words end exactly on the maxLength, etc.

Here's my function:

private static List<string> SplitWordsByLength(string str, int maxLength)
{
    List<string> chunks = new List<string>();
    while (str.Length > 0)
    {
        if (str.Length <= maxLength)                    //if remaining string is less than length, add to list and break out of loop
        {
            chunks.Add(str);
            break;
        }

        string chunk = str.Substring(0, maxLength);     //Get maxLength chunk from string.

        if (char.IsWhiteSpace(str[maxLength]))          //if next char is a space, we can use the whole chunk and remove the space for the next line
        {
            chunks.Add(chunk);
            str = str.Substring(chunk.Length + 1);      //Remove chunk plus space from original string
        }
        else
        {
            int splitIndex = chunk.LastIndexOf(' ');    //Find last space in chunk.
            if (splitIndex != -1)                       //If space exists in string,
                chunk = chunk.Substring(0, splitIndex); //  remove chars after space.
            str = str.Substring(chunk.Length + (splitIndex == -1 ? 0 : 1));      //Remove chunk plus space (if found) from original string
            chunks.Add(chunk);                          //Add to list
        }
    }
    return chunks;
}

Test usage:

string testString = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
int length = 35;

List<string> test = SplitWordsByLength(testString, length);

foreach (string chunk in test)
{
    Console.WriteLine(chunk);  
}

Console.ReadLine();
like image 184
sǝɯɐſ Avatar answered Sep 30 '22 01:09

sǝɯɐſ


Try this:

    static void Main(string[] args)
    {
        int partLength = 35;
        string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
        string[] words = sentence.Split(' ');
        var parts = new Dictionary<int, string>();
        string part = string.Empty;
        int partCounter = 0;
        foreach (var word in words)
        {
            if (part.Length + word.Length < partLength)
            {
                part += string.IsNullOrEmpty(part) ? word : " " + word;
            }
            else
            {
                parts.Add(partCounter, part);
                part = word;
                partCounter++;
            }
        }
        parts.Add(partCounter, part);
        foreach (var item in parts)
        {
            Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
        }
        Console.ReadLine();
    }
like image 21
Tomas Jansson Avatar answered Sep 30 '22 01:09

Tomas Jansson