Can python dictionary comprehension be used to create a dictionary of substrings and their locations?

Tags:

Given a string of characters, I want to create a dictionary of all the n-character substrings contained in the string, where the dictionary key is the substring and the value is a list. The first element of the list is the number of occurrences of the substring, and the second element of the list is a list of starting locations for those occurrences.

For example, with n=3, the string 'abcdabcxdabc' results in this dictionary:

{'abc': [3, [0, 4, 9]], 
 'cda': [1, [2]], 
 'dab': [2, [3, 8]], 
 'bcd': [1, [1]], 
 'cxd': [1, [6]], 
 'bcx': [1, [5]], 
 'xda': [1, [7]]}

The code below works and is efficient since it only walks through the string once, but I'm wondering if there is a more elegant and/or more pythonic way to do this, maybe using a dictionary comprehension. I'm pretty new to python and still trying to figure out when it makes sense (or is even possible) to use comprehensions, etc.

text = 'abcdabcxdabc'
n = 3
d = {}
for i in range(len(text) - n + 1):
    sub = text[i:i + n]
    if sub in d:
        d[sub][0] += 1
        d[sub][1].append(i)
    else:
        d[sub] = [1, [i]]
print(d)

Update: Thanks for all the replies. They generally confirm my suspicion that this is too complex to be efficiently implemented in a single comprehension (but thanks to volcano for showing that it is possible if efficiency isn't a concern). Thanks to RemcoGerlich and Ignacio Vazquez-Abrams for pointing me to defaultdict. I'll have to dig into that more. My timing results suggest that there's a little more overhead in initializing a defaultdict compared to a dict, but that the running time might be slightly faster, at least for this example. (Timing results posted in a separate comment below.) Now I'm wondering if there are any situations where dict would be preferred over defaultdict. Also, thanks to Narcolei for pointing me to the timeit functions.

811

asked Dec 04 '13 12:12

user3065699

2 Answers

The problem is that v[0] depends on the length or v[1], which means that either the operation to generate v[1] would have to operate twice, or that the dictionary would have to be iterated over in order to fill in v[0] to replace the dummy value included the first time.

Another problem is that dict comprehensions expect the entire key and value to be available immediately, which means that you would have to run a list comprehension to get all the indexes of the character, which means that the entire operation becomes O(n²).

The only optimization I would make would be to replace the creation of d so that you don't need to check for key containment.

d = collections.defaultdict(lambda: [0, []])

105

answered Oct 24 '22 10:10

Ignacio Vazquez-Abrams

It is scary, but (I added just offsets, number of occurrences you may get from list of offsets). Yes, it may be done

In [83]: my_str = 'abcdabcxdabc'

In [84]: n=3

In [85]: {substr: [my_str.replace(substr, ' '*n, c).index(substr) 
                   for c in xrange(my_str.count(substr))]
   ....: for substr in set(my_str[idx:idx+n] for idx in xrange(len(my_str)-n))}
Out[85]: 
{'abc': [0, 4, 9],
 'bcd': [1],
 'bcx': [5],
 'cda': [2],
 'cxd': [6],
 'dab': [3, 8],
 'xda': [7]}

answered Oct 24 '22 11:10

volcano

Related questions
                            
                                Create a nested dictionary for a word python
                            
                                pandas / matplotlib: faceting bar plots
                            
                                Using python and mayavi to create a 3D streamplot
                            
                                Is it possible to "rumble" a Xbox 360 controller with Python?
                            
                                Garbage-collect a lock once no threads are asking for it
                            
                                Running system commands in Python using curses and panel, and come back to previous menu
                            
                                alternative to recursion based merge sort logic
                            
                                Why does django paginator.num_pages returns one even when no objects are in db?
                            
                                Writting a file with multiple images in a grid
                            
                                Derivative of an array in python?
                            
                                how to use enum in swig with python?
                            
                                Python serialize objects list to JSON
                            
                                Reindex sublevel of pandas dataframe multiindex
                            
                                Get current celery task id anywhere in the thread
                            
                                Pandas dataframe computations
                            
                                Python to Java code conversion
                            
                                How to run Python scripts on a web server (e.g localhost)
                            
                                Python: numpy.insert NaN value
                            
                                Visual Python working very slowly
                            
                                How to find next day's Unix timestamp for same hour, including DST, in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can python dictionary comprehension be used to create a dictionary of substrings and their locations?

Tags:

python

dictionary

dictionary-comprehension

user3065699

People also ask

2 Answers

Ignacio Vazquez-Abrams

volcano

Recent Activity

Donate For Us