Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

large scale data mining with clojure

I'm looking for a good reference on

large scale data mining with Clojure

I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes, ...). However I'm not aware of any reference that specifically addresses

large scale data mining with Clojure

The "with clojure" part is rather important to me for the following reasons:

* most theoretical analysis uses big-Oh running time, which ignores constants
* constants matter, if it ends up being a matter of 1 second vs 1 hour (for things that need to be real time)
* or 1 hour vs 1 week (for batch jobs)

In particular, I think there's a lot of interplay between the JVM, Clojure Data Structures, whether data is stored in memory or lazily read from disk -- that can have the "same" algorithm have drastically different running times by "slightly" different implementations.

Thus, my question (all of the above was to avoid being closed by "Check Google"):

what is a good resource on massive data mining with Clojure?

Thanks!

like image 308
user1383359 Avatar asked Jun 19 '12 14:06

user1383359


2 Answers

I don't think anyone's yet written a good comprehensive reference. But there is certainly lots of work going on in this space (my own company included!)

Some interesting links to follow up:

  • Storm - distributed realtime computation using Clojure. Could be used for large scale data mining.
  • http://www.infoq.com/presentations/Why-Prismatic-Goes-Faster-With-Clojure - interesting video regarding Clojure performance and optimisation for machine learning applications
  • Incanter - probably the leading Clojure library for statistics and data visualisation
  • Weka - very comprehensive data mining / machine learning library for Java (and hence very easy to use directly from Clojure)
like image 153
mikera Avatar answered Sep 23 '22 01:09

mikera


There is a wonderful book that is coming out in May 2013: Clojure Data Analysis Cookbook. I will probably buy it.

http://www.amazon.co.uk/Clojure-Data-Analysis-Cookbook-ebook/dp/B00BECVV9C/ref=sr_1_1?s=books&ie=UTF8&qid=1360697819&sr=1-1

In Detail

Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.

"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.

You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.

Approach

Full of practical tips, the "Clojure Data Analysis Cookbook" will help you fully utilize your data through a series of step-by-step, real world recipes covering every aspect of data analysis.

Who this book is for

Prior experience with Clojure and data analysis techniques and workflows will be beneficial, but not essential.

like image 41
RNO Avatar answered Sep 25 '22 01:09

RNO