I'm a newbie in spark/scala. This is what I'm doing to calculate the first quartile of a csv file
val column= sc.textFile("test.txt").map(_.split(",")(2)).flatMap(_.split(",")).map((_.toDouble))
val total = column.count.toDouble
val upper=(total+1)/4
val upper2= scala.math.ceil(upper).toInt
I'm not really sure how to sort the column other than adding a Key Value Pair. all I need is to take the last 2 values for the quartiles, after they are sorted. But i'm forced to create a key value pair.
val quartiles = column.map((_,1)).sortByKey(true).take(upper2)
val first_quartile =0
if(upper % upper.toInt >0){
first_quartile = quartiles(upper.toInt-1)
}else{
first_quartile = (quartiles(upper2-1) +(quartiles(upper2-2))/2
}
This works, but it will leave me with an annoying key value pair. how do i revert back to just 1 column, instead of 2 (e.g. the key value pair)
Example of Quartile First, mark down the median, Q2, which in this case is the 10th value: 75. Q1 is the central point between the smallest score and the median. In this case, Q1 falls between the first and fifth score: 68. (Note that the median can also be included when calculating Q1 or Q3 for an odd set of values.
Was just doing this myself. I started out writing a function to compute the median, but found it was faster and easier to get quantiles by casting my RDD as a DataFrame and querying it with SQL. Here's a self-contained example:
// construct example RDD
val rows = Seq(3, 1, 5, 1, 9, 2, 2)
val rdd = sc.parallelize(rows)
// construct Dataframe
case class MedianDF(value: Long)
val df = rdd.map(row => MedianDF(row.toLong)).toDF
// register the table and then query for your desired percentile
df.registerTempTable("table")
sqlContext.sql("SELECT PERCENTILE(value, 0.5) FROM table").show()
Which returns 2, the median. Similarly, if you want the first quartile just pass 0.25 to PERCENTILE:
sqlContext.sql("SELECT PERCENTILE(value, 0.25) FROM table").show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With