OVERVIEW
I got a set of possible valid chunks I can use to split a text (if possible).
How can i split a given text using these chunks such as the result will be optimized (minimized) in terms of the number of resulting chunks?
TEST SUITE
if __name__ == "__main__":
import random
import sys
random.seed(1)
# 1) Testing robustness
examples = []
sys.stdout.write("Testing correctness...")
N = 50
large_number = "3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481"
for i in range(100):
for j in range(i):
choices = random.sample(range(i), j)
examples.append((choices, large_number))
for (choices, large_number) in examples:
get_it_done(choices, large_number)
sys.stdout.write("OK")
# 2) Testing correctness
examples = [
# Example1 ->
# Solution ['012345678910203040506070', '80', '90', '100', '200', '300', '400', '500', '600', '700', '800', '900']
(
[
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "20", "30", "40", "50", "60", "70", "80", "90",
"100", "200", "300", "400", "500", "600", "700", "800", "900",
"012345678910203040506070"
],
"0123456789102030405060708090100200300400500600700800900"
),
# Example2
## Solution ['100']
(
["0", "1", "10", "100"],
"100"
),
# Example3
## Solution ['101234567891020304050', '6070809010020030040050', '0600700800900']
(
[
"10", "20", "30", "40", "50", "60", "70", "80", "90",
"012345678910203040506070",
"101234567891020304050",
"6070809010020030040050",
"0600700800900"
],
"10123456789102030405060708090100200300400500600700800900"
),
# Example4
### Solution ['12', '34', '56', '78', '90']
(
[
"12", "34", "56", "78", "90",
"890",
],
"1234567890"
),
# Example5
## Solution ['12', '34']
(
[
"1", "2", "3",
"12", "23", "34"
],
"1234"
),
# Example6
## Solution ['100', '10']
(
["0", "1", "10", "100"],
"10010"
)
]
score = 0
for (choices, large_number) in examples:
res = get_it_done(choices, large_number)
flag = "".join(res) == large_number
print("{0}\n{1}\n{2} --> {3}".format(
large_number, "".join(res), res, flag))
print('-' * 80)
score += flag
print(
"Score: {0}/{1} = {2:.2f}%".format(score, len(examples), score / len(examples) * 100))
# 3) TODO: Testing optimization, it should provide (if possible)
# minimal cases
QUESTION
How could I solve this problem on python without using a brute-force approach?
Using dynamic programming, you can construct a list (l0, l1, l2, ... ln-1)
, where n
is the number of characters in your input string and li
is the minimum number of chunks you need to arrive at character i
of the input string. The overall structure would look as follows:
minValues := list with n infinity entries
for i from 0 to n-1
for every choice c that is a suffix of input[0..i]
if i - len(c) < 0
newVal = 1
else
newVal = minValues[i - len(c)] + 1
end if
if(newVal < minValues[i])
minValues[i] = newVal
//optionally record the used chunk
end if
next
next
The minimum number of chunk for your entire string is then ln-1
. You can get the actual chunks by tracking back through the list (which requires to record the used chunks).
Retrieving the choices that are suffixes can be sped up using a trie (of the reverse choice strings). The worst case complexity will still be O(n * c * lc)
, where n
is the length of the input string, c
is the number of choices, and lc
is the maximum length of the choices. However, this complexity will only occur for choices that are nested suffixes (e.g. 0
, 10
, 010
, 0010
...). In this case, the trie will degenerate to a list. In average, the run time should be much less. Under the assumption that the number of retrieved choices from the trie is always a small constant, it is O(n * lc)
(actually, the lc
factor is probably also smaller).
Here is an example:
choices = ["0","1","10","100"]
text = "10010"
algorithm step content of minValues
0 1 2 3 4
---------------------------------------------------------
initialize (∞, ∞ , ∞ , ∞ , ∞ )
i = 0, c = "1" (1 "1", ∞ , ∞ , ∞ , ∞ )
i = 1, c = "0" (1 "1", 2 "0", ∞ , ∞ , ∞ )
i = 1, c = "10" (1 "1", 1 "10", ∞ , ∞ , ∞ )
i = 2, c = "0" (1 "1", 1 "10", 2 "0", ∞ , ∞ )
i = 2, c = "100" (1 "1", 1 "10", 1 "100", ∞ , ∞ )
i = 3, c = "1" (1 "1", 1 "10", 1 "100", 2 "1", ∞ )
i = 4, c = "0" (1 "1", 1 "10", 1 "100", 2 "1", 3 "0" )
i = 4, c = "10" (1 "1", 1 "10", 1 "100", 2 "1", 2 "10")
Meaning: We can compose the string with 2 chunks. Tracing back gives the chunks in reverse order: "10", "100".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With