Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting numbers from 1 to 999,999,999 in words as strings

Tags:

algorithm

People also ask

How do I sort numbers in a string?

Convert each element in the String array obtained in the previous step into an integer and store in into the integer array. The sort() method of the Arrays class accepts an array, sorts the contents of it in ascending order. Sort the integer array using this method.

How are numbers sorted alphabetically?

Thus numbers always sort before uppercase letters and upper case before lower case letters. Accented characters come after that. This unusual order reflects the numerical sequence of codes used indicate different letters symbols and numbers.


Edit: Solved!

You can create a generator that outputs the numbers in sorted order. There are a few rules for comparing concatenated strings that I think most of us know implicitly:

  • a < a+b, where b is non-null.
  • a+b < a+c, where b < c.
  • a+b < c+d, where a < c, and a is not a subset of c.

If you start with a sorted list of the first 1000 numbers, you can easily generate the rest by appending "thousand" or "million" and concatenating another group of 1000.

Here's the full code, in Python:

import heapq

first_thousand=[('', 0), ('one', 1), ('two', 2), ('three', 3), ('four', 4),
                ('five', 5), ('six', 6), ('seven', 7), ('eight', 8),
                ('nine', 9), ('ten', 10), ('eleven', 11), ('twelve', 12),
                ('thirteen', 13), ('fourteen', 14), ('fifteen', 15),
                ('sixteen', 16), ('seventeen', 17), ('eighteen', 18),
                ('nineteen', 19)]
tens_name = (None, 'ten', 'twenty', 'thirty', 'forty', 'fifty', 'sixty',
             'seventy','eighty','ninety')
for number in range(20, 100):
    name = tens_name[number/10] + first_thousand[number%10][0]
    first_thousand.append((name, number))
for number in range(100, 1000):
    name = first_thousand[number/100][0] + 'hundred' + first_thousand[number%100][0]
    first_thousand.append((name, number))

first_thousand.sort()

def make_sequence(base_generator, suffix, multiplier):
    prefix_list = [(name+suffix, number*multiplier)
                   for name, number in first_thousand[1:]]
    prefix_list.sort()
    for prefix_name, base_number in prefix_list:
        for name, number in base_generator():
            yield prefix_name + name, base_number + number
    return

def thousand_sequence():
    for name, number in first_thousand:
        yield name, number
    return

def million_sequence():
    return heapq.merge(first_thousand,
                       make_sequence(thousand_sequence, 'thousand', 1000))

def billion_sequence():
    return heapq.merge(million_sequence(),
                       make_sequence(million_sequence, 'million', 1000000))

def solve(stopping_size = 51000000000):
    total_chars = 0
    total_sum = 0
    for name, number in billion_sequence():
        total_chars += len(name)
        total_sum += number
        if total_chars >= stopping_size:
            break
    return total_chars, total_sum, name, number

It took a while to run, about an hour. The 51 billionth character is the last character of sixhundredseventysixmillionsevenhundredfortysixthousandfivehundredseventyfive, and the sum of the integers to that point is 413,540,008,163,475,743.


I'd sort the names of the first 20 integers and the names of the tens, hundreds and thousands, work out how many numbers start with each of those, and go from there.

For example, the first few are [ eight, eighteen, eighthundred, eightmillion, eightthousand, eighty, eleven, ....

The numbers starting with "eight" are 8. With "eighthundred", 800-899, 800,000-899,999, 800,000,000-899,999,999. And so on.

The number of letters in the concatenation of words for 0 ( represented by the empty string ) to 99 can be found and totalled; this can be multiplied with "thousand"=8 or "million"=7 added for higher ranges. The value for 800-899 will be 100 times the length of "eighthundred" plus the length of 0-99. And so on.


This guy has a solution to the puzzle written in Haskell. Apparently Michael Borgwardt was right about using a Trie for finding the solution.


Those strings are going to have lots and lots of common prefixes - perfect use case for a trie, which would drastically reduce memory usage and probably also running time.


(The first attempt at this is wrong, but I will leave it up since it's more useful to see mistakes on the way to solving something rather than just the final answer.)

I would first generate the strings from 0 to 999 and store them into an array called thousandsStrings. The 0 element is "", and "" represents a blank in the lists below.

The thousandsString setup uses the following:

Units: "" one two three ... nine

Teens: ten eleven twelve ... nineteen

Tens: "" "" twenty thirty forty ... ninety

The thousandsString setup is something like this:

thousandsString[0] = ""

for (i in 1..10)
   thousandsString[i] = Units[i]
end

for (i in 10..19)
   thousandsString[i] = Teens[i]
end

for (i in 20..99)
   thousandsString[i] = Tens[i/10] + Units[i%10]
end

for (i in 100..999)
   thousandsString[i] = Units[i/100] + "hundred" + thousandsString[i%100]
end

Then, I would sort that array alphabetically.

Then, assuming t1 t2 t3 are strings taken from thousandsString, all of the strings have the form

t1

OR

t1 + million + t2 + thousand + t3

OR

t1 + thousand + t2

To output them in the proper order, I would process the individual strings, followed by the millions strings followed by the string + thousands strings.

foreach (t1 in thousandsStrings)

   if (t1 == "")
     continue;

   process(t1)

   foreach (t2 in thousandsStrings)
      foreach (t3 in thousandsStrings)
         process (t1 + "million" + t2 + "thousand" + t3)
      end
   end

   foreach (t2 in thousandsStrings)
       process (t1 + "thousand" + t2)
   end
end

where process means store the previous sum length and then add the new string length to the sum and if the new sum is >= your target sum, you spit out the results, and maybe return or break out of the loops, whatever makes you happy.

=====================================================================

Second attempt, the other answers were right that you need to use 3k strings instead of 1k strings as a base.

Start with the thousandsString from above, but drop the blank "" for zero. That leaves 999 elements and call this uStr (units string).

Create two more sets:

tStr = the set of all uStr + "thousand"

mStr = the set of all uStr + "million"

Now create two more set unions:

mtuStr = mStr union tStr union uStr

tuStr = tStr union uStr

Order uStr, tuStr, mtuStr

Now the looping and logic here are a bit different than before.

foreach (s1 in mtuStr)
   process(s1)

   // If this is a millions or thousands string, add the extra strings that can
   // go after the millions or thousands parts.

   if (s1.contains("million"))
      foreach (s2 in tuStr)
         process (s1+s2)          

         if (s2.contains("thousand"))
            foreach (s3 in uStr)
               process (s1+s2+s3)
            end
         end
      end
   end

   if (s1.contains("thousand"))
      foreach (s2 in uStr)
         process (s1+s2)
      end
   end
end

Here's my python solution that prints out the correct answer in a fraction of a second. I'm not a python programmer generally, so apologies for any egregious code style errors.

#!/usr/bin/env python

import sys

ONES=[
    "",         "one",      "two",      "three",        "four", 
    "five",     "six",      "seven",    "eight",        "nine",
    "ten",      "eleven",   "twelve",   "thirteen",     "fourteen",
    "fifteen",  "sixteen",  "seventeen","eighteen",     "nineteen",
]
TENS=[
    "zero",     "ten",      "twenty",   "thirty",       "forty",
    "fifty",    "sixty",    "seventy",  "eighty",       "ninety",
]

def to_s_h(i):
    if(i<20):
        return(ONES[i])
    return(TENS[i/10] + ONES[i%10]) 

def to_s_t(i):
    if(i<100):
        return(to_s_h(i))
    return(ONES[i/100] + "hundred" + to_s_h(i%100))

def to_s_m(i):
    if(i<1000):
        return(to_s_t(i))
    return(to_s_t(i/1000) + "thousand" + to_s_t(i%1000))

def to_s_b(i):
    if(i<1000000):
        return(to_s_m(i))
    return(to_s_m(i/1000000) + "million" + to_s_m(i%1000000))


def try_string(s,t):
    global letters_to_go,word_sum
    l=len(s)
    letters_to_go -= l
    word_sum += t
    if(letters_to_go == 0):
        print "solved: " + s
        print "sum is: " + str(word_sum)
        sys.exit(0)
    elif(letters_to_go < 0):
        print "failed: " + s + " " + str(letters_to_go)
        sys.exit(-1)

def solve(depth,prefix,prefix_num):
    global millions,thousands,ones,letters_to_go,onelen,thousandlen,word_sum
    src=[ millions,thousands,ones ][depth]
    for x in src:
        num=prefix + x[2]
        nn=prefix_num+x[1]
        try_string(num,nn)
        if(x[0] == 0):
            continue
        if(x[0] == 1):
            stl=(len(num) * 999) + onelen
            ss=(nn*999) + onesum
        else:
            stl=(len(num) * 999999) + thousandlen + onelen*999
            ss=(nn*999999) + thousandsum
        if(stl < letters_to_go):
            letters_to_go -= stl
            word_sum += ss
        else:
            solve(depth+1,num,nn)


ones=[]
thousands=[]
millions=[]
onelen=0
thousandlen=0
onesum=(999*1000)/2
thousandsum=(999999*1000000)/2



for x in range(1,1000):
    s=to_s_b(x)
    l=len(s)
    ones.append( (0,x,s) )
    onelen += l
    thousands.append( (0,x,s) )
    thousands.append( (1,x*1000,s + "thousand") )
    thousandlen += l + (l+len("thousand"))*1000
    millions.append( (0,x,s) )
    millions.append( (1,x*1000,s + "thousand") )
    millions.append( (2,x*1000000,s + "million") )

ones.sort(key=lambda x: x[2])
thousands.sort(key=lambda x: x[2])
millions.sort(key=lambda x: x[2])

letters_to_go=51000000000
word_sum=0

solve(0,"",0)

It works by precomputing the length of the numbers from 1..999 and 1..999999 so that it can skip entire subtrees unless it knows that the answer lies somewhere within them.