Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to split dictionary keys into a string-type tuples and append another string to last items in the tuples?

Given a dictionary of string key and integer values, what's the fastest way to

  1. split each key into a string-type key tuple
  2. then append a special substring </w> to the last item in the tuple

Given:

counter = {'The': 6149,
     'Project': 205,
     'Gutenberg': 78,
     'EBook': 5,
     'of': 39169,
     'Adventures': 2,
     'Sherlock': 95,
     'Holmes': 198,
     'by': 6384,
     'Sir': 30,
     'Arthur': 18,
     'Conan': 3,
     'Doyle': 2,}

The goal is to achieve:

counter = {('T', 'h', 'e</w>'): 6149,
 ('P', 'r', 'o', 'j', 'e', 'c', 't</w>'): 205,
 ('G', 'u', 't', 'e', 'n', 'b', 'e', 'r', 'g</w>'): 78,
 ('E', 'B', 'o', 'o', 'k</w>'): 5,
 ('o', 'f</w>'): 39169,
 ('A', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'e', 's</w>'): 2,
 ('S', 'h', 'e', 'r', 'l', 'o', 'c', 'k</w>'): 95,
 ('H', 'o', 'l', 'm', 'e', 's</w>'): 198,
 ('b', 'y</w>'): 6384,
 ('S', 'i', 'r</w>'): 30,
 ('A', 'r', 't', 'h', 'u', 'r</w>'): 18,
 ('C', 'o', 'n', 'a', 'n</w>'): 3,
 ('D', 'o', 'y', 'l', 'e</w>'): 2,}

One way to do it is to

  • iterate through the counter and
  • converting all but the last character to the tuple
  • add to the tuple and create an outer tuple
  • and assign the tuple key to the count

I've tried

{(tuple(k[:-1])+(k[-1]+'</w>',) ,v) for k,v in counter.items()}

In more verbose form:

new_counter = {}
for k, v in counter.items():
    left = tuple(k[:-1])
    right = tuple(k[-1]+'w',)
    new_k = (left + right,)
    new_counter[new_k] = v

Is there a better way to do this?

Regarding the adding tuple and casting it to an outer tuple. Why is this allowed? Isn't tuple supposed to be immutable?

like image 928
alvas Avatar asked Jan 03 '19 06:01

alvas


People also ask

How do you split a dictionary value in Python?

Method 1: Split dictionary keys and values using inbuilt functions. Here, we will use the inbuilt function of Python that is . keys() function in Python, and . values() function in Python to get the keys and values into separate lists.

How do you split words in tuple?

When it is required to convert a string into a tuple, the 'map' method, the 'tuple' method, the 'int' method, and the 'split' method can be used. The map function applies a given function/operation to every item in an iterable (such as list, tuple). It returns a list as the result.

How do you turn a dictionary into a tuple?

Use the items() Function to Convert a Dictionary to a List of Tuples in Python. The items() function returns a view object with the dictionary's key-value pairs as tuples in a list. We can use it with the list() function to get the final result as a list.


2 Answers

You are close making a little changes to your code using tuple. You cannot modify the elements of a tuple, but you can replace one tuple with another::

{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}

{('T', 'h', 'e</w>'): 6149,
 ('P', 'r', 'o', 'j', 'e', 'c', 't</w>'): 205,
 ('G', 'u', 't', 'e', 'n', 'b', 'e', 'r', 'g</w>'): 78,
 ('E', 'B', 'o', 'o', 'k</w>'): 5,
 ('o', 'f</w>'): 39169,
 ('A', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'e', 's</w>'): 2,
 ('S', 'h', 'e', 'r', 'l', 'o', 'c', 'k</w>'): 95,
 ('H', 'o', 'l', 'm', 'e', 's</w>'): 198,
 ('b', 'y</w>'): 6384,
 ('S', 'i', 'r</w>'): 30,
 ('A', 'r', 't', 'h', 'u', 'r</w>'): 18,
 ('C', 'o', 'n', 'a', 'n</w>'): 3,
 ('D', 'o', 'y', 'l', 'e</w>'): 2}
like image 29
Space Impact Avatar answered Oct 05 '22 19:10

Space Impact


I would propose a slightly modified version of your solution. Instead of using tuple constructor you can use tuple unpacking:

>>> {(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}

The benefit of using tuple unpacking is you will get better performance as compared to tuple constructor. I will shed some more light on this by using timeit. I will be using randomly generated dict. Each key in the dict will have 2 randomly chosen characters from lower case alphabets and each value will be an integer in range 0-100. For all these benchmarks I am using Python 3.7.0

Benchmark with 100 elements in dict

$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(100)}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}
$ 10000 loops, best of 5: 36.6 usec per loop

$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(100)}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 5000 loops, best of 5: 59.7 usec per loop

Benchmark with 1000 elements in dict

$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(1000)}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}"
$ 1000 loops, best of 5: 192 usec per loop

$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(1000)}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 1000 loops, best of 5: 321 usec per loop

Benchmark with dict posted in question

$ python -m timeit -s "import random" -s "import string" -s "counter = counter = {'The': 6149, 'Project': 205, 'Gutenberg': 78, 'EBook': 5, 'of': 39169, 'Adventures': 2, 'Sherlock': 95, 'Holmes': 198, 'by': 6384, 'Sir': 30, 'Arthur': 18, 'Conan': 3,'Doyle': 2}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}"
$ 50000 loops, best of 5: 7.28 usec per loop

$ python -m timeit -s "import random" -s "import string" -s "counter = counter = {'The': 6149, 'Project': 205, 'Gutenberg': 78, 'EBook': 5, 'of': 39169, 'Adventures': 2, 'Sherlock': 95, 'Holmes': 198, 'by': 6384, 'Sir': 30, 'Arthur': 18, 'Conan': 3,'Doyle': 2}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 20000 loops, best of 5: 11 usec per loop
like image 144
Sohaib Farooqi Avatar answered Oct 05 '22 21:10

Sohaib Farooqi