How can I randomly generate letters according to their frequency of use in common speech?
Any pseudo-code appreciated, but an implementation in Java would be fantastic. Otherwise just a poke in the right direction would be helpful.
Note: I don't need to generate the frequencies of usage - I'm sure I can look that up easily enough.
I am assuming that you store the frequencies as floating point numbers between 0 and 1 that total to make 1.
First you should prepare a table of cumulative frequencies, i.e. the sum of the frequency of that letter and all letters before it.
To simplify, if you start with this frequency distribution:
A 0.1
B 0.3
C 0.4
D 0.2
Your cumulative frequency table would be:
A 0.1
B 0.4 (= 0.1 + 0.3)
C 0.8 (= 0.1 + 0.3 + 0.4)
D 1.0 (= 0.1 + 0.3 + 0.4 + 0.2)
Now generate a random number between 0 and 1 and see where in this list that number lies. Choose the letter that has the smallest cumulative frequency larger than your random number. Some examples:
Say you randomly pick 0.612. This lies between 0.4 and 0.8, i.e. between B and C, so you'd choose C.
If your random number was 0.039, that comes before 0.1, i.e. before A, so choose A.
I hope that makes sense, otherwise feel free to ask for clarifications!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With