Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database Functional Programming in Clojure

"It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow

I need to write a tool to dump a large hierarchical (SQL) database to XML. The hierarchy consists of a Person table with subsidiary Address, Phone, etc. tables.

  • I have to dump thousands of rows, so I would like to do so incrementally and not keep the whole XML file in memory.

  • I would like to isolate non-pure function code to a small portion of the application.

  • I am thinking that this might be a good opportunity to explore FP and concurrency in Clojure. I can also show the benefits of immutable data and multi-core utilization to my skeptical co-workers.

I'm not sure how the overall architecture of the application should be. I am thinking that I can use an impure function to retrieve the database rows and return a lazy sequence that can then be processed by a pure function that returns an XML fragment.

For each Person row, I can create a Future and have several processed in parallel (the output order does not matter).

As each Person is processed, the task will retrieve the appropriate rows from the Address, Phone, etc. tables and generate the nested XML.

I can use a a generic function to process most of the tables, relying on database meta-data to get the column information, with special functions for the few tables that need custom processing. These functions could be listed in a map(table name -> function).

Am I going about this in the right way? I can easily fall back to doing it in OO using Java, but that would be no fun.

BTW, are there any good books on FP patterns or architecture? I have several good books on Clojure, Scala, and F#, but although each covers the language well, none look at the "big picture" of function programming design.

like image 788
Ralph Avatar asked Jan 05 '11 12:01

Ralph


People also ask

Is Clojure functional programming?

Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures.

Is Clojure pure functional?

Clojure is functional, with immutable data types and variables, but you can get mutable behavior in some special cases or by dropping down to Java (Clojure runs on the JVM). A purely functional programming language is only good for heating your computer.

Is Clojure better than Java?

Clojure enables you to write programs that are better and more flexible, and above all makes you much more productive than using Java. By now the language has proven itself in the industry.

Is Clojure similar to Haskell?

Clojure is an general purpose dynamic programming language. On the other hand, Haskell is a strictly typed programming language. In short, we can say it is an functional programming language. Since it is dynamic so it is very difficult or us to find the error.


1 Answers

Ok, cool, you're using this as an opportunity to showcase Clojure. So, you want to demonstrate FP and concurrency. Roger that.

To wow your interlocutors I would make a point to demonstrate:

  • Performance of your program using a single thread.
  • How your program's performance increases as you increase the number of threads.
  • How easy it is to take your program from single to multi-threaded.

You might create a function to dump a single table to an XML file.

(defn table-to-xml [name] ...)

With that you can work out all or your code for the core task of converting your relational data to XML.

Now that you've solved the core problem see if throwing more threads at it will increase your speed.

You might modify table-to-xml to accept an additional parameter:

(defn table-to-xml [name thread-count] ...)

This implies that you have n threads working on one table. In this case every thread might processes every nth row. A problem with putting multiple threads on one table is that each thread is going to want to write to the same XML file. That bottleneck may make the strategy useless, but it's worth a shot.

If creating one XML file per table is acceptable then spawning one thread per table would likely be an easy win.

(map #(future (table-to-xml %)) (table-names))

Using just a one-to-one relationship between tables, files and threads: as a guideline, I would expect your code to not contain any refs or dosyncs and the solution should be pretty straight forward.

Once you start spawning multiple threads per table you are adding complexity and may not see much of a performance increase.

In any case you would likely have one or two queries per table for getting values and meta-data. Regarding your comment about not wanting to load all the data in memory: Each thread would only be processing one row at a time.

Hope that helps!

Given your comment here's some pseudo-ish code that might help:

(defn write-to-xml [person]
  (dosync
   (with-out-append-writer *path*
     (print-person-as-xml))))

(defn resolve-relation [person table-name one-or-many]
  (let [result (query table-name (:id person))]
    (assoc person table-name (if (= :many one-or-many)
                               result
                               (first result)))))

(defn person-to-xml [person]
  (write-to-xml
   (-> person
       (resolve-relation "phones" :many)
       (resolve-relation "addresses" :many))))

(defn get-people []
  (map convert-to-map (query-db ...)))

(defn people-to-xml []
  (map (fn [person]
         (future (person-to-xml %)))
       (get-people)))

You might consider using the Java executors library to create a thread pool.

like image 88
Psyllo Avatar answered Oct 17 '22 16:10

Psyllo