Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using plotly with zeppellin in scala

I want to display my results in the form of a histogram in Zeppelin. I came across plotly. My code is in scala and I would like to know the steps to incorporate plotly into zeppelin using scala. Or is there any better way(libraries) that can be used to draw a histogram in Zeppelin(Scala)?

like image 415
anony29 Avatar asked Jul 12 '16 08:07

anony29


3 Answers

If you have a dataframe called plotTemp with columns "id","degree" then you can do the following:

  1. In a scala window register the dataframe as a temporary table

plotTemp.registerTempTable("plotTemp")

  1. Then switch to the SQL interpreter in a new window

    %sql
    select degree, count(1) nInBin
    from plotTemp
    group by degree
    order by degree
    

You can then click on the bar plot icon and you should see what you are looking for

Example of distribution plot done in Zeppelin

like image 71
Charles Copley Avatar answered Nov 14 '22 01:11

Charles Copley


After trying basically every available solution I eventually settled for vegas-viz. If you look at their project's page on GitHub, they claim to be "The Missing MatPlotLib for Scala + Spark". Although that sounds a little bit exaggerated to me at the moment, the library does its work and does it well.

This is the procedure I suggest for drawing a Bar Chart (that's what you need for histograms, basically) in the Zeppelin's Spark Interpreter:

  1. import dependencies (please check the vegas maven repository for the latest versions)

    %dep  
    z.load("org.vegas-viz:vegas_2.11:0.3.11")
    z.load("org.vegas-viz:vegas-spark_2.11:0.3.11")
    

Note that vegas-spark is needed only if you want to draw directly from a DataFrame, see below.

  1. import packages

    import vegas._  
    import vegas.render.WindowRenderer._
    
  2. draw chart

    val plot = Vegas("Sample Column Chart")
      .withData(
        Seq(
          Map("country" -> "USA", "population" -> 314),
          Map("country" -> "UK", "population" -> 64),
          Map("country" -> "DK", "population" -> 80)
        )
      )
      .encodeX("country", Nom)
      .encodeY("population", Quant)
      .mark(Bar)
    plot.show
    

    The result should be similar to the image below:

enter image description here

  1. you can even draw an image directly from a DataFrame if you have added vegas-spark among the dependencies (see point 1.) but you also need an extra import for this to work:

    import vegas.sparkExt._
    
    val df = Seq(
      ("USA", 314),
      ("UK", 64),
      ("DK", 80)
    ).toDF("country", "population")
    
    val plot = Vegas("Sample Column Chart", width=600, height=320)
      .withDataFrame(df)
      .encodeX("country", Nom)
      .encodeY("population", Quant)
      .mark(Bar)
    plot.show
    

The result should be the same as above.

like image 22
Sal Borrelli Avatar answered Nov 14 '22 01:11

Sal Borrelli


I just released spark-highcharts. With following code, you can create a histogram.

import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
highcharts(bank
    .series("x" -> "age", "y" -> count("*"))
    .orderBy(col("age"))
  )
  .chart(Chart.column)
  .plotOptions(new plotOptions.Column().groupPadding(0).pointPadding(0).borderWidth(0))
  .plot()

enter image description here

like image 1
Rockie Yang Avatar answered Nov 14 '22 01:11

Rockie Yang