I have a two lists of strings like the following:
test1 = ["abc", "abcdef", "abcedfhi"]
test2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]
The second list is longer, so I want to downsample it to the length of the first list by randomly sampling.
def downsample(data):
min_len = min(len(x) for x in data)
return [random.sample(x, min_len) for x in data]
downsample([list1, list2])
However, I want to add a restriction that the words chosen from the second list must match the length distribution of the first list. So for the first word that is randomly chosen, it must be of the same length as the first word of the shorter list. The issue here is that replacement is not allowed either.
How can I randomly select n (length of shorter list) elements from test2
which matches the character length distribution of test1
?
Thanks,
Jack
Using random. randrange() to select random value from a list. random. randrange() method is used to generate a random number in a given range, we can specify the range to be 0 to the length of the list, and get the index, and then the corresponding value.
Use the random. sample() function when you want to choose multiple random items from a list without repetition or duplicates. There is a difference between choice() and choices() . The choices() was added in Python 3.6 to choose n elements from the list randomly, but this function can repeat items.
In simple terms, for example, you have a list of 100 names, and you want to choose ten names randomly from it without repeating names, then you must use random. sample() . Note: Use the random. choice() function if you want to choose only a single item from the list.
Select randomly n elements from a list using choice() The choice() method is used to return a random number from given sequence. The sequence can be a list or a tuple. This returns a single value from available data that considers duplicate values in the sequence(list).
Setup
from collections import defaultdict
import random
dct = defaultdict(list)
l1 = ["abc", "abcdef", "abcedfhi"]
l2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]
First, use collections.defaultdict
to create a dictionary where the key is word length:
for word in l2:
dct[len(word)].append(word)
# Result
defaultdict(<class 'list'>, {3: ['The', 'the', 'the'], 6: ['silver', 'number'], 8: ['proposes', 'blushing', 'explores'], 5: ['burst'], 4: ['fast', 'iron'], 10: ['impossible']})
Then you may use a simple list comprehension along with random.choice
to select a random word that matches the length of each element in your first list. If a word length is not found in your dictionary, fill with -1
:
final = [random.choice(dct.get(len(w), [-1])) for w in l1]
# Output
['The', 'silver', 'blushing']
Edit based on clarified requirements
Here is an approach that fulfills the requirements of not allowing duplicates if a duplicate does not exist in list 2:
for word in l2:
dct[len(word)].append(word)
for k in dct:
random.shuffle(dct[k])
final = [dct[len(w)].pop() for w in l1]
# ['The', 'silver', 'proposes']
This approach will raise an IndexError
if not enough words exist in the second list to fulfill the distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With