I am looking for a algorithm that takes a string and splits it into a certain number of parts. These parts shall contain complete words (so whitespaces are used to split the string) and the parts shall be of nearly the same length, or contain the longest possible parts.
I know it is not that hard to code a function that can do what I want but I wonder whether there is a well-proven and fast algorithm for that purpose?
edit: To clarify my question I'll describe you the problem I am trying to solve.
I generate images with a fixed width. Into these images I write user names using GD and Freetype in PHP. Since I have a fixed width I want to split the names into 2 or 3 lines if they don't fit into one.
In order to fill as much space as possible I want to split the names in a way that each line contains as much words as possible. With this I mean that in one line should be as much words as neccessary in order to keep each line's length near to an average line length of the whole text block. So if there are one long word and two short words the two short words should stand on one line if it makes all lines about equal long.
(Then I compute the text block width using 1, 2 or 3 lines and if it fits into my image I render it. Just if there are 3 lines and it won't fit I decrease the font size until everything is fine.)
Example:
This is a long text
should be display something like that:
This is a
long text
or:
This is
a long
text
but not:
This
is a long
text
and also not:
This is a long
text
Hope I could explain clearer what I am looking for.
To split a string into substring on N characters, call the match() method on the string, passing it the following regular expression /. {1, N}g/ . The match method will return an array containing substrings with length of N characters.
As the name suggests, a Java String Split() method is used to decompose or split the invoking Java String into parts and return the Array. Each part or item of an Array is delimited by the delimiters(“”, “ ”, \\) or regular expression that we have passed. The return type of Split is an Array of type Strings.
split() method accepts two arguments. The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .
If you're talking about line-breaking, take a look at Dynamic Line Breaking, which gives a Dynamic Programming solution to divide words into lines.
I don't know about proven, but it seems like the simplest and most efficient solution would be to divide the length of the string by N then find the closest white space to the split locations (you'll want to search both forward and back).
The below code seems to work though there are plenty of error conditions that it doesn't handle. It seems like it would run in O(n) where n is the number of strings you want.
class Program
{
static void Main(string[] args)
{
var s = "This is a string for testing purposes. It will be split into 3 parts";
var p = s.Length / 3;
var w1 = 0;
var w2 = FindClosestWordIndex(s, p);
var w3 = FindClosestWordIndex(s, p * 2);
Console.WriteLine(string.Format("1: {0}", s.Substring(w1, w2 - w1).Trim()));
Console.WriteLine(string.Format("2: {0}", s.Substring(w2, w3 - w2).Trim()));
Console.WriteLine(string.Format("3: {0}", s.Substring(w3).Trim()));
Console.ReadKey();
}
public static int FindClosestWordIndex(string s, int startIndex)
{
int wordAfterIndex = -1;
int wordBeforeIndex = -1;
for (int i = startIndex; i < s.Length; i++)
{
if (s[i] == ' ')
{
wordAfterIndex = i;
break;
}
}
for (int i = startIndex; i >= 0; i--)
{
if (s[i] == ' ')
{
wordBeforeIndex = i;
break;
}
}
if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
return wordAfterIndex;
else
return wordBeforeIndex;
}
}
The output for this is:
1: This is a string for
2: testing purposes. It will
3: be split into 3 parts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With