Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to handle an arbitrarily large string in Python? (created via the * operator)

Tags:

python

string

We can build strings of fixed structure but arbitrary length with the * operator, with the following:

length = 10
print "0" * length

This returns what is expected, 0000000000. The problem arises when length is excessively large, resulting in an overflow error:

length = 10000000000000000000000000000000000000000000000
print "0" * length

This results in OverflowError: cannot fit 'long' into an index-sized integer.

I am curious, can such a formulation somehow be used for sizes of arbitrary length? Or what is the correct way to handle a scenario where length is unknown and may take on such a large value?

like image 234
Chris Avatar asked Dec 27 '12 02:12

Chris


2 Answers

No, you cannot create large strings like the one in your example in any programming language. Strings contain each character on its own. And 1046 bytes is most likely way more data than anyone will ever store. You could take way over a trillion Google datacenters (assuming Google has 1 YiB of storage which is surely not the case yet..) and still have much less disk space, let alone RAM which would be what you'd need for such a string.

To store a huge string like the one from your example you'd have to create your own str-like class that handles __mul__ in a way where the number of repetitions are stored without actually storing the whole string in memory. Obviously this implementation would become extremely complex as soon as you allow modifications to that string.

like image 162
ThiefMaster Avatar answered Nov 16 '22 03:11

ThiefMaster


You may write something like string generator in python. For example:

import sys

def stringWithArbitraryLength(stringLength):
    n = 0
    while n < stringLength:
        # pattern here
        if n % 2 == 0:
            yield "0"
        else:
            yield "1"
        n += 1

Infinity = float('inf')

# Usage 1: print the long string
# for c in stringWithArbitraryLength(Infinity):
#   sys.stdout.write(c)

# Usage 2: instantiate the long string
soLong = stringWithArbitraryLength(100000)  # output 01010101....
print ''.join(soLong)

# Usage 3: transform the long string
def transformString(longLongString):
    for c in longLongString:
        if c == "1":
            yield "X"
        else:
            yield c
soLong2 = stringWithArbitraryLength(100000)  # output 0X0X0X0X....
print ''.join(transformString(soLong2))

It has several limitations:

  1. It allows only sequential access, not random access. So you have to use for-loop to walk through the string.
  2. Each character cannot depend on characters with larger indices.
  3. Instantiation is difficult if the length is large. But you can do random access after instantiation.

In many cases, you don't have to instantiate the whole string. Because you can use IO stream to do input/output. And use generator to process the string. So each time you handle part of the data only.

If you want to understand more about long, or infinitely long, string, you may learn some non-strict functional languages, for example, Haskell. It evaluates expression lazily. Infinity list/string are often used in these languages.

like image 23
HKTonyLee Avatar answered Nov 16 '22 03:11

HKTonyLee