Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm: Split a string into N parts using whitespaces so all parts have nearly the same length

I am looking for a algorithm that takes a string and splits it into a certain number of parts. These parts shall contain complete words (so whitespaces are used to split the string) and the parts shall be of nearly the same length, or contain the longest possible parts.

I know it is not that hard to code a function that can do what I want but I wonder whether there is a well-proven and fast algorithm for that purpose?

edit: To clarify my question I'll describe you the problem I am trying to solve.

I generate images with a fixed width. Into these images I write user names using GD and Freetype in PHP. Since I have a fixed width I want to split the names into 2 or 3 lines if they don't fit into one.

In order to fill as much space as possible I want to split the names in a way that each line contains as much words as possible. With this I mean that in one line should be as much words as neccessary in order to keep each line's length near to an average line length of the whole text block. So if there are one long word and two short words the two short words should stand on one line if it makes all lines about equal long.

(Then I compute the text block width using 1, 2 or 3 lines and if it fits into my image I render it. Just if there are 3 lines and it won't fit I decrease the font size until everything is fine.)

Example: This is a long text should be display something like that:

This is a
long text

or:

This is
a long
text

but not:

This
is a long
text

and also not:

This is a long
text

Hope I could explain clearer what I am looking for.

like image 676
ralle Avatar asked Mar 04 '10 17:03

ralle


People also ask

How can I split a string into segments of N characters?

To split a string into substring on N characters, call the match() method on the string, passing it the following regular expression /. {1, N}g/ . The match method will return an array containing substrings with length of N characters.

How do I split a string into multiple parts?

As the name suggests, a Java String Split() method is used to decompose or split the invoking Java String into parts and return the Array. Each part or item of an Array is delimited by the delimiters(“”, “ ”, \\) or regular expression that we have passed. The return type of Split is an Array of type Strings.

Can split () take two arguments?

split() method accepts two arguments. The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .


2 Answers

If you're talking about line-breaking, take a look at Dynamic Line Breaking, which gives a Dynamic Programming solution to divide words into lines.

like image 194
Larry Avatar answered Sep 26 '22 02:09

Larry


I don't know about proven, but it seems like the simplest and most efficient solution would be to divide the length of the string by N then find the closest white space to the split locations (you'll want to search both forward and back).

The below code seems to work though there are plenty of error conditions that it doesn't handle. It seems like it would run in O(n) where n is the number of strings you want.

class Program
{
    static void Main(string[] args)
    {
        var s = "This is a string for testing purposes. It will be split into 3 parts";
        var p = s.Length / 3;
        var w1 = 0;
        var w2 = FindClosestWordIndex(s, p);
        var w3 = FindClosestWordIndex(s, p * 2);
        Console.WriteLine(string.Format("1: {0}", s.Substring(w1, w2 - w1).Trim()));
        Console.WriteLine(string.Format("2: {0}", s.Substring(w2, w3 - w2).Trim()));
        Console.WriteLine(string.Format("3: {0}", s.Substring(w3).Trim()));
        Console.ReadKey();
    }

    public static int FindClosestWordIndex(string s, int startIndex)
    {
        int wordAfterIndex = -1;
        int wordBeforeIndex = -1;
        for (int i = startIndex; i < s.Length; i++)
        {
            if (s[i] == ' ')
            {
                wordAfterIndex = i;
                break;
            }
        }
        for (int i = startIndex; i >= 0; i--)
        {
            if (s[i] == ' ')
            {
                wordBeforeIndex = i;
                break;
            }
        }

        if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
            return wordAfterIndex;
        else
            return wordBeforeIndex;
    }
}

The output for this is:

1: This is a string for
2: testing purposes. It will
3: be split into 3 parts
like image 36
Brian Avatar answered Sep 23 '22 02:09

Brian