Time complexity calculation for my algorithm

Tags:

algorithm

Given a string, find the first non-repeating character in it and return its index. If it doesn't exist, return -1. You may assume the string contain only lowercase letters.

I'm going to define a hash that tracks the occurrence of characters. Traverse the string from left to right, check if the current character is in the hash, continue if yes, otherwise in another loop traverse the rest of the string to see if the current character exists. Return the index if not and update the hash if it exists.

def firstUniqChar(s):

    track = {}
    for index, i in enumerate(s):
        if i in track:
            continue
        elif i in s[index+1:]: # For the last element, i in [] holds False
            track[i] = 1
            continue
        else:
            return index
    return -1

firstUniqChar('timecomplexity')

What's the time complexity (average and worst) of my algorithm?

899

asked Aug 23 '16 03:08

Nicholas

1 Answers

Your algorithm has time complexity of O(kn) where k is the number of unique characters in the string. If k is a constant then it is O(n). As the problem description clearly bounds the number of alternatives for elements ("assume lower-case (ASCII) letters"), thus k is constant and your algorithm runs in O(n) time on this problem. Even though n will grow to infinite, you will only make O(1) slices of the string and your algorithm will remain O(n). If you removed track, then it would be O(n²):

In [36]: s = 'abcdefghijklmnopqrstuvwxyz' * 10000

In [37]: %timeit firstUniqChar(s)
100 loops, best of 3: 18.2 ms per loop

In [38]: s = 'abcdefghijklmnopqrstuvwxyz' * 20000

In [37]: %timeit firstUniqChar(s)
10 loops, best of 3: 36.3 ms per loop

In [38]: s = 'timecomplexity' * 40000 + 'a'

In [39]: %timeit firstUniqChar(s)
10 loops, best of 3: 73.3 ms per loop

It pretty much holds there that the T(n) is still of O(n) complexity - it scales exactly linearly with number of characters in the string, even though this is the worst-case scenario for your algorithm - there is no single character that is be unique.

I will present a not-that efficient, but simple and smart method here; count the character histogram first with collections.Counter; then iterate over the characters finding the one

from collections import Counter
def first_uniq_char_ultra_smart(s):
    counts = Counter(s)
    for i, c in enumerate(s):
        if counts[c] == 1:
            return i

    return -1

first_uniq_char('timecomplexity')

This has time complexity of O(n); Counter counts the histogram in O(n) time and we need to enumerate the string again for O(n) characters. However in practice I believe my algorithm has low constants, because it uses a standard dictionary for Counter.

And lets make a very stupid brute-force algorithm. Since you can assume that the string contains only lower-case letters, then use that assumption:

import string
def first_uniq_char_very_stupid(s):
    indexes = []
    for c in string.ascii_lowercase:
        if s.count(c) == 1:
            indexes.append(s.find(c))

    # default=-1 is Python 3 only
    return min(indexes, default=-1)

Let's test my algorithm and some algorithms found in the other answers, on Python 3.5. I've chosen a case that is pathologically bad for my algorithm:

In [30]: s = 'timecomplexity' * 10000 + 'a'

In [31]: %timeit first_uniq_char_ultra_smart(s)
10 loops, best of 3: 35 ms per loop

In [32]: %timeit karin(s)
100 loops, best of 3: 11.7 ms per loop

In [33]: %timeit john(s)
100 loops, best of 3: 9.92 ms per loop

In [34]: %timeit nicholas(s)
100 loops, best of 3: 10.4 ms per loop

In [35]: %timeit first_uniq_char_very_stupid(s)
1000 loops, best of 3: 1.55 ms per loop

So, my stupid algorithm is the fastest, because it finds the a at the end and bails out. And my smart algorithm is slowest, One more reason for bad performance of my algorithm besides this being worst case is that OrderedDict is written in C on Python 3.5, and Counter is in Python.

Let's make a better test here:

In [60]: s = string.ascii_lowercase * 10000

In [61]: %timeit nicholas(s)
100 loops, best of 3: 18.3 ms per loop

In [62]: %timeit karin(s)
100 loops, best of 3: 19.6 ms per loop

In [63]: %timeit john(s)
100 loops, best of 3: 18.2 ms per loop

In [64]: %timeit first_uniq_char_very_stupid(s)
100 loops, best of 3: 2.89 ms per loop

So it appears that the "stupid" algorithm of mine isn't all that stupid at all, it exploits the speed of C while minimizing the number of iterations of Python code being run, and wins clearly in this problem.

answered Sep 23 '22 16:09

Antti Haapala -- Слава Україні

Related questions
                            
                                how to Pass variable in link to the django view
                            
                                plot in Pandas immediately closes
                            
                                Kivy error with text input
                            
                                How to use uuid lib with mod_wsgi?
                            
                                Efficiently processing data in text file
                            
                                Computing mean square displacement using python and FFT
                            
                                python move all elements of a list to the right by one [duplicate]
                            
                                Java equivalent to python's "with"
                            
                                how to upload multiple files using flask in python
                            
                                How to draw ellipsoid with plotly
                            
                                How to round Avg() to nearest integer? django
                            
                                Password Protecting Excel file using Python
                            
                                How to import own module in PyCharm console
                            
                                How to convert .dat to .csv using python?
                            
                                find the index of a boolean array whose values are true
                            
                                How to convert a list of tuples to a dictionary of dictionaries in Python?
                            
                                Pandas concatenate alternating columns
                            
                                Python scipy module import error due to missing ._ufuncs dll
                            
                                How can I use urllib.request.urlretrieve with python 2.7
                            
                                django.db.utils.OperationalError: no such table: auth_user

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With