How are integer operations implemented in Python?

Tags:

This might be a very broad question. I wanted to create a way to represent strings that would support O(1) append, O(1) append to the left, and O(1) comparison while maintaining O(N) slicing and O(1) indexing. My idea is that I would store unicode characters as their unicode number, and use mathematical operations to add and delete characters from either end. I called it NumString:

class Numstring:
    def __init__(self, init_str=""):
        self.num = 0
        self.length = 0

        for char in init_str:
            self.append(char)

    def toString(self, curr=None):
        if curr is None:
            curr = self.num

        retlst = []

        while curr:
            char = chr(curr % 10000)
            retlst.append(char)
            curr = curr // 10000

        return "".join(retlst)

    def appendleft(self, char):
        assert len(char) == 1
        self.num *= 10000
        self.num += ord(char)
        self.length += 1

    def append(self, char):
        self.num += ord(char)*(10000**self.length)
        self.length += 1    

    def pop(self):
        # self.num = self.num % (10000**self.length-1)
        self.num = self.num % 10000**(self.length-1)
        self.length -= 1

    def popleft(self):
        self.num = self.num // 10000
        self.length -= 1

    def compare(self, numstring2):
        return self.num == numstring2.num

    def size(self):
        return self.length

    def get(self, start, end=None):
        if start >= self.length:
            raise IndexError("index out of bounds")
        if end and end > self.length + 1:
            raise IndexError("index out of bounds")

        if end is not None:
            if start == end:
                return ""

            curr = self.num

            curr = curr // (10000 ** start)

            curr = curr % 10000**(end)

            return self.toString(curr)
        else:
            curr = self.num

            curr = curr // (10000 ** start)

            char = chr(curr % 10000)
            return char

Here is the profiling code:

import string
import time
import random
from NumString import Numstring
import numpy as np
import matplotlib.pyplot as plt
if __name__ == '__main__':
    

    numstring_times = []
    regstring_times = []

    numstring = Numstring("abc")
    regstring = "abc"

    for i in range(0, 10000):
        char_to_add = random.choice(string.ascii_letters)
        
        start_time = time.time()
        numstring.append(char_to_add)
        end_time = time.time()

        numstring_times.append(end_time-start_time)

        start_time = time.time()
        regstring += char_to_add
        end_time = time.time()

        regstring_times.append(end_time-start_time)

    plt.plot(numstring_times, label="Numstring")
    plt.plot(regstring_times, label="Regular strings")
    plt.legend()
    plt.show()

It works the way I want it to, but when I tested the time it takes to append to the string and my Numstring class performed way worse. I understood that there would be overhead, but I didn't expect the overall trend to be way worse than concatenating strings in python.

Time it takes to append a character to

Curiously, comparison times are actually better for Numstrings than regular strings:

Comparison times for Numstrings vs Regular strings

I've realized I don't really understand how integer operations in Python are implemented and what the limitations of infinite integer precision are. Could somebody help me understand what I'm missing?

987

asked Sep 24 '20 23:09

Andrew Ho

1 Answers

ints are essentially represented as strings of digits (in a higher base than 10), and most operations on them, including addition and multiplication, take time proportional to the number of digits they contain or worse. Because the base you’re using (10000) doesn’t match the base ints use, operations like multiplying or dividing by the base become a complex operation instead of a simple copying of bytes of memory. So it’s pretty much a less efficient reimplementation of the operations strings already do (which is what you found by benchmarking) and it doesn’t support all of Unicode (which goes up to 0x10FFFF, not 10000).

Hint for a data structure that actually has the properties you’re looking for, though: a deque based on a circular buffer.

140

answered Sep 30 '22 20:09

2 revs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How are integer operations implemented in Python?

Tags:

Andrew Ho

People also ask

1 Answers

2 revs

Recent Activity

Donate For Us

How are integer operations implemented in Python?

Tags:

Andrew Ho

People also ask

1 Answers

2 revs

Related questions

Recent Activity

Donate For Us