I was going through Andrew NG's machine learning course. I am still at the beginning stages. With his housing price prediction example in the class he teaches Supervised learning.
Is it possible predict the RSA token which will be generated next after providing a dataset of "right" values for the machine learning program? Can we use supervised learning to make the program learn the algorithm?
Supervised learning depends on exploiting regularities in the data. For example if the data is plotted against the desired output there may be clusters or highly populated surfaces in the space. The various learning algorithms you will learn in class are all ways of exploiting one type of structure or another. If the dataset is random and unconnected to the output desired, then no learning can be done.
RSA is useful cryptographically precisely because it is a non-random process that is exceptionally difficult to distinguish from a random process with no structure. There are no obvious regularities in the data to exploit.
I am reluctant to discourage you from taking a look at this; you never know what it might spark or what you might learn. But in your place I would not want any part of my grade to depend on success. I will say that to succeed in any meaningful sense you will almost have to base the learning on features that no-one has thought of till now. If you are determined to try this I would recommend starting with very small primes and only if you get any traction graduate to larger primes.
Part of the reason for being dubious depends on complexity arguments. If one can solve arbitrary RSA problems based on a composite number then one can factor that number in a reasonable amount of time, however factoring an arbitrary composite number is believed (but not known) to be NP hard, though not NP complete.
It won't work.
An RSA token creates a psuedo-random sequence of numbers from a seed.
In theory, if you had infinite resources then you could train an algorithm long enough that it "learnt" the entire sequence of pseudo-random numbers. And then you could predict the sequence (and potentially even infer the seed) from a set of previous values.
In practice this approach is guaranteed to fail because both:
By "too large" and "too long" you should understand "longer/larger than anyone in the universe will ever be able to achieve".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With