Given a dictionary of string key and integer values, what's the fastest way to
</w>
to the last item in the tupleGiven:
counter = {'The': 6149,
'Project': 205,
'Gutenberg': 78,
'EBook': 5,
'of': 39169,
'Adventures': 2,
'Sherlock': 95,
'Holmes': 198,
'by': 6384,
'Sir': 30,
'Arthur': 18,
'Conan': 3,
'Doyle': 2,}
The goal is to achieve:
counter = {('T', 'h', 'e</w>'): 6149,
('P', 'r', 'o', 'j', 'e', 'c', 't</w>'): 205,
('G', 'u', 't', 'e', 'n', 'b', 'e', 'r', 'g</w>'): 78,
('E', 'B', 'o', 'o', 'k</w>'): 5,
('o', 'f</w>'): 39169,
('A', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'e', 's</w>'): 2,
('S', 'h', 'e', 'r', 'l', 'o', 'c', 'k</w>'): 95,
('H', 'o', 'l', 'm', 'e', 's</w>'): 198,
('b', 'y</w>'): 6384,
('S', 'i', 'r</w>'): 30,
('A', 'r', 't', 'h', 'u', 'r</w>'): 18,
('C', 'o', 'n', 'a', 'n</w>'): 3,
('D', 'o', 'y', 'l', 'e</w>'): 2,}
One way to do it is to
I've tried
{(tuple(k[:-1])+(k[-1]+'</w>',) ,v) for k,v in counter.items()}
In more verbose form:
new_counter = {}
for k, v in counter.items():
left = tuple(k[:-1])
right = tuple(k[-1]+'w',)
new_k = (left + right,)
new_counter[new_k] = v
Is there a better way to do this?
Regarding the adding tuple and casting it to an outer tuple. Why is this allowed? Isn't tuple supposed to be immutable?
Method 1: Split dictionary keys and values using inbuilt functions. Here, we will use the inbuilt function of Python that is . keys() function in Python, and . values() function in Python to get the keys and values into separate lists.
When it is required to convert a string into a tuple, the 'map' method, the 'tuple' method, the 'int' method, and the 'split' method can be used. The map function applies a given function/operation to every item in an iterable (such as list, tuple). It returns a list as the result.
Use the items() Function to Convert a Dictionary to a List of Tuples in Python. The items() function returns a view object with the dictionary's key-value pairs as tuples in a list. We can use it with the list() function to get the final result as a list.
You are close making a little changes to your code using tuple
. You cannot modify the elements of a tuple, but you can replace one tuple with another::
{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}
{('T', 'h', 'e</w>'): 6149,
('P', 'r', 'o', 'j', 'e', 'c', 't</w>'): 205,
('G', 'u', 't', 'e', 'n', 'b', 'e', 'r', 'g</w>'): 78,
('E', 'B', 'o', 'o', 'k</w>'): 5,
('o', 'f</w>'): 39169,
('A', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'e', 's</w>'): 2,
('S', 'h', 'e', 'r', 'l', 'o', 'c', 'k</w>'): 95,
('H', 'o', 'l', 'm', 'e', 's</w>'): 198,
('b', 'y</w>'): 6384,
('S', 'i', 'r</w>'): 30,
('A', 'r', 't', 'h', 'u', 'r</w>'): 18,
('C', 'o', 'n', 'a', 'n</w>'): 3,
('D', 'o', 'y', 'l', 'e</w>'): 2}
I would propose a slightly modified version of your solution. Instead of using tuple constructor you can use tuple unpacking:
>>> {(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}
The benefit of using tuple unpacking is you will get better performance as compared to tuple
constructor. I will shed some more light on this by using timeit
. I will be using randomly generated dict
. Each key in the dict
will have 2 randomly chosen characters from lower case alphabets and each value will be an integer in range 0-100. For all these benchmarks I am using Python 3.7.0
Benchmark with 100 elements in dict
$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(100)}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}
$ 10000 loops, best of 5: 36.6 usec per loop
$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(100)}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 5000 loops, best of 5: 59.7 usec per loop
Benchmark with 1000 elements in dict
$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(1000)}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}"
$ 1000 loops, best of 5: 192 usec per loop
$ python -m timeit -s "import random" -s "import string" -s "counter = {''.join(random.sample(string.ascii_lowercase,2)): random.randint(0,100) for _ in range(1000)}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 1000 loops, best of 5: 321 usec per loop
Benchmark with dict posted in question
$ python -m timeit -s "import random" -s "import string" -s "counter = counter = {'The': 6149, 'Project': 205, 'Gutenberg': 78, 'EBook': 5, 'of': 39169, 'Adventures': 2, 'Sherlock': 95, 'Holmes': 198, 'by': 6384, 'Sir': 30, 'Arthur': 18, 'Conan': 3,'Doyle': 2}" "{(*a[:-1],f'a[-1]</w>',):b for a,b in counter.items()}"
$ 50000 loops, best of 5: 7.28 usec per loop
$ python -m timeit -s "import random" -s "import string" -s "counter = counter = {'The': 6149, 'Project': 205, 'Gutenberg': 78, 'EBook': 5, 'of': 39169, 'Adventures': 2, 'Sherlock': 95, 'Holmes': 198, 'by': 6384, 'Sir': 30, 'Arthur': 18, 'Conan': 3,'Doyle': 2}" "{tuple(key[:-1])+(key[-1]+'</w>',):value for key,value in counter.items()}"
$ 20000 loops, best of 5: 11 usec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With