I'm having problems creating visualizations with Zeppelin. I've got a dataset with about 600 million records. It's stored in an HDFS cluster and I'm able to load as a Spark dataframe:
%spark.pyspark
input_hdfs_path = u'hdfs://cluster-master:9000/data/CDR_*.parquet'
df = spark.read.format('parquet').load(input_hdfs_path)
df.registerTempTable("df")
I'm interested in creating histograms on the length of the CDR (field CDR_LENGTH
):
%sql
select ROUND(CDR_LENGTH, -2) as duration, count(*) as count
from df
group by 1
order by 1
I do get the appropriate results in the Table tab (with two columns, duration
and count
), but when going to the bar chart tab (or any other graphic tab), it simply says "No data available". Can you figure out what I'm doing wrong? Thanks
you can find settings
on the right side of chart buttons,
then you define Keys
, Groups
, Values
as you like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With