I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?
Here is a toy-model:
dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:
s=0
for key in dy.key:
s=s+dy[key]
Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is
sam=list(dy.keys())[:1]
In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:
s=0
for key in sam:
s=s+dy[key]
The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue
If you want to get a random key from a dictionary, you can use the dictionary keys() function instead. If you want to get a random key/value pair from a dictionary, you can use the dictionary items() function.
You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.
Python dictionary is not iterable. Hence it doesn't have index to be randomized. Instead collection of its keys is iterable and can be randomized by shuffle() function in random module.
To print dictionary items: key:value pairs, keys, or values, you can use an iterator for the corresponding key:value pairs, keys, or values, using dict. items(), dict. keys(), or dict. values() respectively and call print() function.
def sample_from_dict(d, sample=10):
keys = random.sample(list(d), sample)
values = [d[k] for k in keys]
return dict(zip(keys, values))
Given your example of:
dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
Then the sum of all the values is more simply put as:
s = sum(dy.values())
Then if it's not memory prohibitive, you can sample using:
import random
values = list(dy.values())
s = sum(random.sample(values, 2))
Or, since random.sample
can take a set
-like object, then:
from operator import itemgetter
import random
s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))
Or just use:
s = sum(dy[k] for k in random.sample(dy.keys(), 2))
An alternative is to use a heapq
, eg:
import heapq
import random
s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))
Replace the range(10)
with some randome sample from numphy
{v:rows[v] for v in [list(rows.keys())[k] for k in range(10)]}
This should be quicker than creating a new dict and checking if the keys are part of the sample:
import random
sample_n = 1000
output_dict = dict(random.sample(input_dict.items(), sample_n))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With