Before you think that it's duplicated (there are many question asking how to split long strings without breaking words) take in mind that my problem is a bit different: order is not important and I've to fit the words in order to use every line as much as possible.
I've a unordered set of words and I want to combine them without using more than 253 characters.
def compose(words):
result = " ".join(words)
if len(result) > 253:
pass # this should not happen!
return result
My problem is that I want to try to fill the line as much as possible. For example:
words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters
# This is good because it's the shortest possible list,
# but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]
# This is bad because len(bad) > len(good)
# even if the limit of 5 characters is respected
# This is equivalent to:
# bad = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)
How could I do?
Use triple quotes to create a multiline string It is the simplest method to let a long string split into different lines. You will need to enclose it with a pair of Triple quotes, one at the start and second in the end. Anything inside the enclosing Triple quotes will become part of one multiline string.
Split String at Newline Then use splitlines to split the string at the newline character. Create a string in which two lines of text are separated by \n . You can use + to concatenate text onto the end of a string. Convert \n into an actual newline character.
To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.
This is the bin packing problem; the solution is NP-hard, although there exist non-optimal heuristic algorithms, principally first fit decreasing and best fit decreasing. See https://github.com/type/Bin-Packing for implementations.
Non-optimal offline fast 1D bin packing Python algorithm
def binPackingFast(words, limit, sep=" "):
if max(map(len, words)) > limit:
raise ValueError("limit is too small")
words.sort(key=len, reverse=True)
res, part, others = [], words[0], words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word
else:
part += sep+word
if part:
res.append(part)
return res
Performance
Tested over /usr/share/dict/words
(provided by words-3.0-20.fc18.noarch
) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:
limit = max(map(len, words))
sep = ""
With limit *= 1.5
I get 92%, with limit *= 2
I get 96% (same execution time).
Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)
no efficient bin-packing algorithm can be guaranteed to do better
Source: http://mathworld.wolfram.com/Bin-PackingProblem.html
Moral of the story
While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.
Resources
Notes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With