Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a random sample of a dict

I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

Here is a toy-model:

dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

s=0
for key in dy.key:
    s=s+dy[key]

Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

sam=list(dy.keys())[:1]

In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

s=0
for key in sam:
    s=s+dy[key]

The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

like image 530
user2988577 Avatar asked Oct 12 '16 14:10

user2988577


People also ask

How do you get random items from a dictionary?

If you want to get a random key from a dictionary, you can use the dictionary keys() function instead. If you want to get a random key/value pair from a dictionary, you can use the dictionary items() function.

How do you generate a random sample in Python?

You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.

Can you use random in dictionary Python?

Python dictionary is not iterable. Hence it doesn't have index to be randomized. Instead collection of its keys is iterable and can be randomized by shuffle() function in random module.

How do you print a dictionary sample in Python?

To print dictionary items: key:value pairs, keys, or values, you can use an iterator for the corresponding key:value pairs, keys, or values, using dict. items(), dict. keys(), or dict. values() respectively and call print() function.


4 Answers

def sample_from_dict(d, sample=10):
    keys = random.sample(list(d), sample)
    values = [d[k] for k in keys]
    return dict(zip(keys, values))
like image 92
J-Mourad Avatar answered Oct 12 '22 16:10

J-Mourad


Given your example of:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

Then the sum of all the values is more simply put as:

s = sum(dy.values())

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

Or, since random.sample can take a set-like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

Or just use:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

An alternative is to use a heapq, eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))
like image 39
Jon Clements Avatar answered Oct 12 '22 17:10

Jon Clements


Replace the range(10) with some randome sample from numphy

{v:rows[v] for v in [list(rows.keys())[k] for k in range(10)]}

like image 1
MajorDaxx Avatar answered Oct 12 '22 15:10

MajorDaxx


This should be quicker than creating a new dict and checking if the keys are part of the sample:

import random    
sample_n = 1000
output_dict = dict(random.sample(input_dict.items(), sample_n))
like image 1
muwnd Avatar answered Oct 12 '22 17:10

muwnd