I'm looking for a framework, a combination of frameworks, best-practices, or a tutorial about visualizing large data sets with Hadoop.
I am not looking for a framework to visualize the mechanics of running Hadoop jobs or managing disk space on Hadoop. I am looking for an approach or a guideline for visualizing the data contained within HDFS using graphs and charts, etc.
For example, let's say I have a set of data points stored in multiple files in HDFS, and I would like to show a histogram of the data. Is my only option to write a custom map/reduce job that would try and figure out which points fall into which bucket, write the totals to a file, and then use a plotting library to visualize that?
Do I need to roll out a custom solution, or is there anyone else doing this sort of thing out there? I've trying looking online, but I haven't been able to find something that directly relates to this.
Thank you for your help
We do something like this at Datameer. The files would take a few more processing steps to get to our visualizations, but we run natively on Hadoop so the files would not be far away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With