I would like to randomly select a value in consideration of weightings using Pandas
.
df
:
0 1 2 3 4 5
0 40 5 20 10 35 25
1 24 3 12 6 21 15
2 72 9 36 18 63 45
3 8 1 4 2 7 5
4 16 2 8 4 14 10
5 48 6 24 12 42 30
I am aware of using np.random.choice
, e.g:
x = np.random.choice(
['0-0','0-1',etc.],
1,
p=[0.4,0.24 etc.]
)
And so, I would like to get an output, in a similar style/alternative method to np.random.choice
from df
, but using Pandas
. I would like to do so in a more efficient way in comparison to manually inserting the values as I have done above.
Using np.random.choice
I am aware that all values must add up to 1
. I'm not sure as to how to go about solving this, nor randomly selecting a value based on weightings using Pandas
.
When referring to an output, if the randomly selected weight was for example, 40, then the output would be 0-0 since it is located in that column 0
, row 0
and so on.
Stack the DataFrame:
stacked = df.stack()
Normalize the weights (so that they add up to 1):
weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.
And then use sample:
stacked.sample(1, weights=weights)
Out:
1 2 12
dtype: int64
# Or without normalization, stacked.sample(1, weights=stacked)
DataFrame.sample method allows you to either sample from rows or from columns. Consider this:
df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out:
0 1 2 3 4 5
1 24 3 12 6 21 15
It selects one row (the first row with 40% chance, the second with 30% chance etc.)
This is also possible:
df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out:
1
0 5
1 3
2 9
3 1
4 2
5 6
Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.
df.stack()
Out:
0 0 40
1 5
2 20
3 10
4 35
5 25
1 0 24
1 3
2 12
3 6
4 21
5 15
2 0 72
1 9
2 36
3 18
4 63
5 45
3 0 8
1 1
2 4
3 2
4 7
5 5
4 0 16
1 2
2 8
3 4
4 14
5 10
5 0 48
1 6
2 24
3 12
4 42
5 30
dtype: int64
So if I now sample from this, I will both sample a row and a column. For example:
df.stack().sample()
Out:
1 0 24
dtype: int64
selects row 1 and column 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With