Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

takeSample() function in Spark

I'm trying to use the takeSample() function in Spark and the parameters are - data, number of samples to be taken and the seed. But I don't want to use the seed. I want to have a different answer everytime. I'm not able to figure out how I can do that. I tried using System.nanoTime as the seed value but it gave an error since I think the data type didn't match. Is there any other function similar to takeSample() that can be used without the seed? Or is there any other implementation I can use with takeSample() so that I get a different output every time.

like image 619
Prateek Kulkarni Avatar asked Feb 04 '13 13:02

Prateek Kulkarni


2 Answers

System.nanoTime is of type long, the seed expected by takeSample is of type Int. Hence, takeSample(..., System.nanoTime.toInt) should work.

like image 53
Malte Schwerhoff Avatar answered Sep 30 '22 06:09

Malte Schwerhoff


System.nanoTime returns Long, whereas takeSample expects an Int.
You can feed scala.util.Random.nextInt as a seed value to the takeSample function.

like image 41
om-nom-nom Avatar answered Sep 30 '22 06:09

om-nom-nom