Reading the spark documentation: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.sample
There is this boolean parameter withReplacement
without much explanation.
sample(withReplacement, fraction, seed=None)
What is it and how do we use it?
Using the fraction to get a random sample in Sparks in dataframe, "dataframe" value is created in which DataFrame has 100 records and 11% sample records is wanted in which are 11 but the sample() function returned 13 records that are this proves the sample function doesn't return exact fraction specified using the ...
PySpark When Otherwise – when() is a SQL function that returns a Column type and otherwise() is a function of Column, if otherwise() is not used, it returns a None/NULL value. PySpark SQL Case When – This is similar to SQL expression, Usage: CASE WHEN cond1 THEN result WHEN cond2 THEN result... ELSE result END .
This method returns a stratified sample without replacement based on the fraction given on each stratum. Parameters: col — the column that defines strata. fractions — The sampling fraction for every stratum. In case of a stratum is not specified, its fraction is treated as zero.
The parameter withReplacement
controls the Uniqueness of sample
result. If we treat a Dataset as a bucket of balls, withReplacement=true
means, taking a random ball out of the bucket and place it back into it. that means, the same ball can be picked up again.
Assuming all unique elements in a Dataset:
withReplacement=true
, same element can be produced more than once as the result of sample
.
withReplacement=false
, each element of the dataset will be sampled only once.
import spark.implicits._
val df = Seq(1, 2, 3, 5, 6, 7, 8, 9, 10).toDF("ids")
df.show()
df.sample(true, 0.5, 5)
.show
df.sample(false, 0.5, 5)
.show
Result
+---+
|ids|
+---+
| 1|
| 2|
| 3|
| 5|
| 6|
| 7|
| 8|
| 9|
| 10|
+---+
+---+
|ids|
+---+
| 6|
| 7|
| 7|
| 9|
| 10|
+---+
+---+
|ids|
+---+
| 1|
| 3|
| 7|
| 8|
| 9|
+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With