I'm looking for a good reference on
large scale data mining with Clojure
I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes, ...). However I'm not aware of any reference that specifically addresses
large scale data mining with Clojure
The "with clojure" part is rather important to me for the following reasons:
* most theoretical analysis uses big-Oh running time, which ignores constants
* constants matter, if it ends up being a matter of 1 second vs 1 hour (for things that need to be real time)
* or 1 hour vs 1 week (for batch jobs)
In particular, I think there's a lot of interplay between the JVM, Clojure Data Structures, whether data is stored in memory or lazily read from disk -- that can have the "same" algorithm have drastically different running times by "slightly" different implementations.
Thus, my question (all of the above was to avoid being closed by "Check Google"):
what is a good resource on massive data mining with Clojure?
Thanks!
I don't think anyone's yet written a good comprehensive reference. But there is certainly lots of work going on in this space (my own company included!)
Some interesting links to follow up:
There is a wonderful book that is coming out in May 2013: Clojure Data Analysis Cookbook. I will probably buy it.
http://www.amazon.co.uk/Clojure-Data-Analysis-Cookbook-ebook/dp/B00BECVV9C/ref=sr_1_1?s=books&ie=UTF8&qid=1360697819&sr=1-1
In Detail
Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.
"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.
You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.
Approach
Full of practical tips, the "Clojure Data Analysis Cookbook" will help you fully utilize your data through a series of step-by-step, real world recipes covering every aspect of data analysis.
Who this book is for
Prior experience with Clojure and data analysis techniques and workflows will be beneficial, but not essential.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With