I have a large number of strings. For my purposes, two strings are equivalent if one is a rotation of the other (e.g. '1234' is equivalent to '3412').
What is an efficient way to process every string exactly once (up to rotation) in Python?
A naive implementation of what I want might look like:
class DuplicateException(Exception): pass
seen = set()
for s in my_strings:
try:
s2 = s+s
for t in seen:
# Slick method I picked up here in SO
# for checking whether one string is
# a rotation of another
if len(s) == len(t) and t in s2:
raise DuplicateException()
seen.add(s)
process(s)
except DuplicateException: pass
Pick a canonical way to represent a class of rotated strings (e.g. the lexicographically least rotation among all possible rotations of a string), and work only with canonical representations (canonicalization).
For example:
def canonicalize(s):
return min(s[i:]+s[:i] for i in xrange(len(s)))
canonical_strings = {canonicalize(s) for s in my_strings}
for cs in canonical_strings:
process(cs)
Maybe it makes sense to rotate your string
to a specific value, for example smallest possible rotation, than those smallest rotations are unique, and could be easily put in a set.
Here is a an example implementation, and "rotate_to_smallest" can probably be improved.
my_strings = ['1234', '123', '2341', '4312', '312', '56', '65', '1236']
def rotate_to_smallest(x):
smallest = x
for i in xrange(1, len(x)):
rotation = x[i :] + x[: i]
if rotation < smallest:
smallest = rotation
return smallest
def unique_rotations(my_strings):
uniques = set(())
for s in my_strings:
smallest_rotation = rotate_to_smallest(s)
if smallest_rotation not in uniques:
uniques.add(smallest_rotation)
return uniques
Result:
>>> unique_rotations(my_strings)
set(['1234', '56', '1243', '123', '1236'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With