Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove letter duplicates that are in a row

Looking for a fast way to limit duplicates to a max of 2 when they occur next to each other.

For example: jeeeeeeeep => ['jep','jeep']

Looking for suggestions in python but happy to see an example in anything - not difficult to switch.

Thanks for any assistance!

EDIT: English doesn't have any (or many) consonants (same letter) in a row right? Lets limit this so no duplicate consonants in a row and up to two vowels in a row

EDIT2: I'm silly (hey that word has two consonants), just checking all letters, limiting duplicate letters that are next to each other to two.

like image 474
Jon Phenow Avatar asked Oct 11 '22 12:10

Jon Phenow


1 Answers

Here's a recursive solution using groupby. I've left it up to you which characters you want to be able to repeat (defaults to vowels only though):

from itertools import groupby

def find_dub_strs(mystring):
    grp = groupby(mystring)
    seq = [(k, len(list(g)) >= 2) for k, g in grp]
    allowed = ('aeioupt')
    return rec_dubz('', seq, allowed=allowed)

def rec_dubz(prev, seq, allowed='aeiou'):
    if not seq:
        return [prev]
    solutions = rec_dubz(prev + seq[0][0], seq[1:], allowed=allowed)
    if seq[0][0] in allowed and seq[0][1]:
        solutions += rec_dubz(prev + seq[0][0] * 2, seq[1:], allowed=allowed)
    return solutions

This is really just a heuristically pruned depth-first search into your "solution space" of possible words. The heuristic is that we only allow a single repeat at a time, and only if it is a valid repeatable letter. You should end up with 2**n words at the end, where n is he number times an "allowed" character was repeated in your string.

>>> find_dub_strs('jeeeeeep')
['jep', 'jeep']
>>> find_dub_strs('jeeeeeeppp')
['jep', 'jepp', 'jeep', 'jeepp']
>>> find_dub_strs('jeeeeeeppphhhht')
['jepht', 'jeppht', 'jeepht', 'jeeppht']
like image 94
machine yearning Avatar answered Oct 13 '22 11:10

machine yearning