Finding index of pairwise elements

Tags:

Given the target ('b', 'a') and the inputs:

x0 = ('b', 'a', 'z', 'z')
x1 = ('b', 'a', 'z', 'z')
x2 = ('z', 'z', 'a', 'a')
x3 = ('z', 'b', 'a', 'a')

The aim is to find the location of the continuous ('b', 'a') element and get the output:

>>> find_ba(x0)
0
>>> find_ba(x1)
0
>>> find_ba(x2)
None
>>> find_ba(x3)
1

Using the pairwise recipe:

from itertools import tee
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

I could do this to get the desired output:

def find_ba(x, target=('b', 'a')):
    try:
        return next(i for i, pair in enumerate(pairwise(x)) if pair == target)
    except StopIteration:
        return None

But that would require me to loop through all pairs of characters till I find the first instance. Is there a way to find index of pairwise elements without looping all the characters?

Answering @MatthiasFripp's question in the comments:

Are your elements in lists or types (as shown) or in a generator (e.g. reading from a file handle)?

The x* are all tuples of strings. So they can be access through the index. But if the answer/solution can work for tuples and generator, that'll be great!

Can you say about how many lists you have to search and about how long they are? That would help for suggesting a search strategy.

The lengths of the tuples are not fixed. They can be of size > 2.

203

asked Apr 26 '17 09:04

alvas

Video Answer

2 Answers

The fastest general search algorithm will have O(n) average performance (called linear search), that means you have no alternative (except maybe for a constant factor) than to process each element.

Given your question:

Is there a way to finding index of pairwise elements without looping all the characters?

That's possible (it's still O(n) though) by only looking at each second item:

from itertools import count

def find_ab(tup):
    for idx in count(start=1, step=2):
        try:
            if tup[idx] == 'b':
                if tup[idx+1] == 'a':
                    return idx
            elif tup[idx] == 'a':
                if tup[idx-1] == 'b':
                    return idx-1
        except IndexError:
            break

In the worst case it will still compare all items but it will skip one item for every odd-indexed item that isn't 'b' or 'a'.

That's a bit like cheating so let me explain why common alternatives aren't possible in your case:

Binary search

Binary search only needs to compare log(n) items but it requires the sequence to be sorted. Your examples aren't sorted so sorting them would require O(n*log(n)) operations - which wouldn't just process each item once, it would process some of them several times. Not that I know a sensible way to sort adjacent elements anyway.

Bucket search (or hashtables)

You have tuples so creating a hashtable (a dict) doesn't make sense because in order to create that structure you would need to process each element.

But if you plan to do several of these searches for pairs you could create the dictionary once (O(n)) and do many searches afterwards in O(1):

d = {}
for idx, pair in enumerate(pairwise(x0)):
    if pair not in d:    # keep only the first index for each pair
        d[pair] = idx

>>> d.get(('b', 'a'), None)
0

However that approach is far slower if you only want to search for one pair, because you lose the "short-circuit behaviour" (stops as soon as a match is found) and you process all elements when creating the dictionary.

Other approaches

Besides the general approaches:

O(n) linear search
O(log(n)) binary search (for sorted data)
O(1) lookups (for hashable lookups or other search problems that just need to search in some "bucket"s)

You can generally exploit any structure or knowledge about your data to reduce the amount of items you need to process. The problem is mostly that there are (probably) no already existing data structures for these and homemade implementations often end up orders of magnitude slower than naive "process all elements" approaches. But if you have any meta-informations about your sequences then you can exploit it.

Final remarks

The recipe for pairwise is actually quite nice, but you could also use iteration_utilities.successive¹. Last I checked it was roughly 1.5 to 2 times faster than the recipe. Even if you don't change the approach and accept that you need to process all (or almost all) elements in the worst case it may be faster!

That data was probably generated. Maybe it's worthwhile to actually "search" for the elements during creation. That way you don't need to do the extra pass over the data at all. Or you could create the dict while creating the dataset (that allows for the O(1) lookups afterwards). Sometimes it's a good idea to look at the process that generated/downloaded/fetched the dataset if there is some way the information can be extracted.

Now, after writing all this text I need to state the obvious:

Your approach is really good. Even if it needs to process all elements in the worst case it uses a perfect fit (pairwise-recipe) for the problem at hand and it should actually work really fast even for long inputs. For a tuple containing 1 million 'z' it only needs 200ms on my computer. So you can process several million elements per second (even on old and slow computers like mine). That's probably not fast enough for big-data but then pure-python isn't a good language for processing big-data (typically you would need to write a C extension, use Cython or some NumPy, Pandas or derivative approach). Also the next function on a generator is lazy (assuming you use itertools.izip on python2 instead of zip) so you only process each tuple until you find a match.

Personally, I would simply use your original approach. Or if I had to find several pairs then I would just create the dictionary I mentioned earlier (maybe even serialize it) and do the lookups therein.

The bounty reason explicitly wants "credible and/or official sources". Fortunatly "search algorithms" are well studied so you can find explanations for each of the mentioned approaches in basic text books on algorithms. For example:

Cormen, et. al - Introduction to Algorithms
Sedgewick & Wayne - Algorithms
Wikipedia: "Linear search"
Wikipedia: "Binary search"
Wikipedia: "Hashtable" (that's essentially a dict).

There's also a small overview of time complexities of python types in the python wiki: "TimeComplexity". For lookups you have to check "Get Item " or "in".

¹ Disclosure: I'm the author of that 3rd party library.

answered Oct 12 '22 16:10

MSeifert

Not much impressive though its working in your case check it out.

we are simply extracting index of matched item in sample and checking if it is consecutive or not.

def consecutive_index(src,sample):
    result = None
    il = [src.index(a) for a in sample if a in src]
    if len(il) == len(sample) and len(range(il[0],il[-1]))==1:
        result = il[0]
    return result



x0 = ('b', 'a', 'z', 'z')
x1 = ('b', 'a', 'z', 'z')
x2 = ('z', 'z', 'a', 'a')
x3 = ('z', 'b', 'a', 'a')
sample = ('b', 'a')

##TEST your given combinations.
print consecutive_index(x0,sample) #expected 0
print consecutive_index(x1,sample) #expected 0
print consecutive_index(x2,sample) #expected None
print consecutive_index(x3,sample) #expected 1

answered Oct 12 '22 18:10

DexJ

Related questions
                            
                                How does one use the VSCode debugger to debug a Gunicorn worker process?
                            
                                Serve Static files from Google Cloud Storage Bucket (for Django App hosted on GCE)
                            
                                Python multithreading max_workers
                            
                                Python requests gives me "bad handshake" error
                            
                                PySpark: compute row maximum of the subset of columns and add to an exisiting dataframe
                            
                                Is it possible to effectively initialize bytearray with non-zero value?
                            
                                Changing phrases to vectors with while function in Python
                            
                                Django how to see generated SQL query?
                            
                                TensorFlow - import meta graph and use variables from it
                            
                                python - opencv - convert pixel from bgr to hsv
                            
                                Why does "[] is [ ]" evaluate to False in python
                            
                                Flask POSTs with Trailing Slash
                            
                                How to remove gaps between *images* in matplotlib?
                            
                                How to replace non integer values in a pandas Dataframe?
                            
                                python matplotlib make everything bold
                            
                                While else statement equivalent for Java?
                            
                                500 internal server error mod_wsgi apache "importerror: No Module named 'django'
                            
                                python pandas rename multiple column headers the same way
                            
                                How do I download images with an https URL in Python 3?
                            
                                How to stream the logs in docker python API?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding index of pairwise elements

Tags:

python

indexing

tuples

pairwise