Given a string of a million numbers, return all repeating 3 digit numbers

Tags:

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem...

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: 123412345123456 then the function/program would return:

123 - 3 times
234 - 3 times
345 - 2 times

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 --> 999

Now that I'm thinking about it, I don't think it's possible to come up with a constant time algorithm. Is it?

420

asked Oct 03 '22 10:10

its.david

2 Answers

You got off lightly, you probably don't want to be working for a hedge fund where the quants don't understand basic algorithms :-)

There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.

Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that's not usually how people use complexity analysis.

It appears to me you could have impressed them in a number of ways.

First, by informing them that it's not possible to do it in O(1), unless you use the "suspect" reasoning given above.

Second, by showing your elite skills by providing Pythonic code such as:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

This outputs:

[(123, 3), (234, 3), (345, 2)]

though you could, of course, modify the output format to anything you desire.

And, finally, by telling them there's almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.

And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.

Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):

You can farm these out to separate workers and combine the results afterwards.

The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of "measure, don't guess" applies here, of course.

This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.

For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

169

answered Oct 23 '22 17:10

paxdiablo

Constant time isn't possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.

For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.

answered Oct 23 '22 16:10

rcgldr

Related questions
                            
                                Hide tick label values but keep axis labels
                            
                                How to write DataFrame to postgres table?
                            
                                Evaluating a mathematical expression in a string
                            
                                How do you calculate program run time in python? [duplicate]
                            
                                Convert spark DataFrame column to python list
                            
                                Compact way of writing (a + b == c or a + c == b or b + c == a)
                            
                                Python Logging (function name, file name, line number) using a single file
                            
                                ImportError: No module named 'encodings'
                            
                                Right way to initialize an OrderedDict using its constructor such that it retains order of initial data?
                            
                                Should all Python classes extend object? [duplicate]
                            
                                Is there a way to pass optional parameters to a function?
                            
                                Emacs bulk indent for Python
                            
                                How do I compile a Visual Studio project from the command-line?
                            
                                Representing graphs (data structure) in Python
                            
                                Open files in 'rt' and 'wt' modes
                            
                                What is the right way to debug in iPython notebook?
                            
                                In-place type conversion of a NumPy array
                            
                                Django Rest Framework - Could not resolve URL for hyperlinked relationship using view name "user-detail"
                            
                                How to write a multidimensional array to a text file?
                            
                                What does the caret operator (^) in Python do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Given a string of a million numbers, return all repeating 3 digit numbers

Tags:

python

algorithm

data-structures

number-theory

its.david

People also ask

2 Answers

paxdiablo

rcgldr

Recent Activity

Donate For Us