Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting long string without breaking words fulfilling lines

Before you think that it's duplicated (there are many question asking how to split long strings without breaking words) take in mind that my problem is a bit different: order is not important and I've to fit the words in order to use every line as much as possible.

I've a unordered set of words and I want to combine them without using more than 253 characters.

def compose(words):
    result = " ".join(words)
    if len(result) > 253:
        pass # this should not happen!
    return result

My problem is that I want to try to fill the line as much as possible. For example:

words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters

# This is good because it's the shortest possible list,
#   but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]

# This is bad because len(bad) > len(good)
#   even if the limit of 5 characters is respected
# This is equivalent to:
#   bad  = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)

How could I do?

like image 836
Francesco Frassinelli Avatar asked May 07 '13 08:05

Francesco Frassinelli


People also ask

How do you split a long string?

Use triple quotes to create a multiline string It is the simplest method to let a long string split into different lines. You will need to enclose it with a pair of Triple quotes, one at the start and second in the end. Anything inside the enclosing Triple quotes will become part of one multiline string.

How do you split a string by line?

Split String at Newline Then use splitlines to split the string at the newline character. Create a string in which two lines of text are separated by \n . You can use + to concatenate text onto the end of a string. Convert \n into an actual newline character.

How do I split a string into a list of words?

To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.


2 Answers

This is the bin packing problem; the solution is NP-hard, although there exist non-optimal heuristic algorithms, principally first fit decreasing and best fit decreasing. See https://github.com/type/Bin-Packing for implementations.

like image 80
ecatmur Avatar answered Oct 10 '22 23:10

ecatmur


Non-optimal offline fast 1D bin packing Python algorithm

def binPackingFast(words, limit, sep=" "):
    if max(map(len, words)) > limit:
        raise ValueError("limit is too small")
    words.sort(key=len, reverse=True)
    res, part, others = [], words[0], words[1:]
    for word in others:
        if len(sep)+len(word) > limit-len(part):
            res.append(part)
            part = word
        else:
            part += sep+word
    if part:
        res.append(part)
    return res

Performance

Tested over /usr/share/dict/words (provided by words-3.0-20.fc18.noarch) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:

limit = max(map(len, words))
sep = ""

With limit *= 1.5 I get 92%, with limit *= 2 I get 96% (same execution time).

Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)

no efficient bin-packing algorithm can be guaranteed to do better

Source: http://mathworld.wolfram.com/Bin-PackingProblem.html

Moral of the story

While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.

Resources

  • http://mathworld.wolfram.com/Bin-PackingProblem.html
  • https://github.com/hudora/pyShipping/

Notes

  • I didn't use textwrap for my implementation because it's slower than my simple Python code. Maybe it's related with: Why are textwrap.wrap() and textwrap.fill() so slow?
  • It seems to work perfectly even if the sorting is not reversed.
like image 37
Francesco Frassinelli Avatar answered Oct 11 '22 00:10

Francesco Frassinelli