I want to display my results in the form of a histogram in Zeppelin. I came across plotly. My code is in scala and I would like to know the steps to incorporate plotly into zeppelin using scala. Or is there any better way(libraries) that can be used to draw a histogram in Zeppelin(Scala)?
If you have a dataframe called plotTemp with columns "id","degree" then you can do the following:
plotTemp.registerTempTable("plotTemp")
Then switch to the SQL interpreter in a new window
%sql
select degree, count(1) nInBin
from plotTemp
group by degree
order by degree
You can then click on the bar plot icon and you should see what you are looking for
Example of distribution plot done in Zeppelin
After trying basically every available solution I eventually settled for vegas-viz. If you look at their project's page on GitHub, they claim to be "The Missing MatPlotLib for Scala + Spark". Although that sounds a little bit exaggerated to me at the moment, the library does its work and does it well.
This is the procedure I suggest for drawing a Bar Chart (that's what you need for histograms, basically) in the Zeppelin's Spark Interpreter:
import dependencies (please check the vegas maven repository for the latest versions)
%dep
z.load("org.vegas-viz:vegas_2.11:0.3.11")
z.load("org.vegas-viz:vegas-spark_2.11:0.3.11")
Note that vegas-spark is needed only if you want to draw directly from a DataFrame, see below.
import packages
import vegas._
import vegas.render.WindowRenderer._
draw chart
val plot = Vegas("Sample Column Chart")
.withData(
Seq(
Map("country" -> "USA", "population" -> 314),
Map("country" -> "UK", "population" -> 64),
Map("country" -> "DK", "population" -> 80)
)
)
.encodeX("country", Nom)
.encodeY("population", Quant)
.mark(Bar)
plot.show
The result should be similar to the image below:
you can even draw an image directly from a DataFrame if you have added vegas-spark among the dependencies (see point 1.) but you also need an extra import for this to work:
import vegas.sparkExt._
val df = Seq(
("USA", 314),
("UK", 64),
("DK", 80)
).toDF("country", "population")
val plot = Vegas("Sample Column Chart", width=600, height=320)
.withDataFrame(df)
.encodeX("country", Nom)
.encodeY("population", Quant)
.mark(Bar)
plot.show
The result should be the same as above.
I just released spark-highcharts. With following code, you can create a histogram.
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
highcharts(bank
.series("x" -> "age", "y" -> count("*"))
.orderBy(col("age"))
)
.chart(Chart.column)
.plotOptions(new plotOptions.Column().groupPadding(0).pointPadding(0).borderWidth(0))
.plot()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With