I'd appreciate suggestions/insights on how I can leverage Clojure to efficiently parse and compare two files. There are two (log) files that contain employee attendance; from these files I need to determine all the days that two employees worked the same times, in the same department. Below are examples of the log files.
Note: each file has differing number of entries.
First File:
Employee Id Name Time In Time Out Dept.
mce0518 Jon 2011-01-01 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-02 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-04 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 17:00 2011-01-01 23:00 ER
Second File:
Employee Id Name Time In Time Out Dept.
pdm1705 Jane 2011-01-01 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-02 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-05 06:00 2011-01-01 13:00 ER
pdm1705 Jane 2011-01-05 17:00 2011-01-01 23:00 ER
if you are not going to do it periodically,
(defn data-seq [f]
(with-open [rdr (java.io.BufferedReader.
(java.io.FileReader. f))]
(let [s (rest (line-seq rdr))]
(doall (map seq (map #(.split % "\\s+") s))))))
(defn same-time? [a b]
(let [a (drop 2 a)
b (drop 2 b)]
(= a b)))
(let [f1 (data-seq "f1.txt")
f2 (data-seq "f2.txt")]
(reduce (fn[h v]
(let [f2 (filter #(same-time? v %) f2)]
(if (empty? f2)
h
(conj h [(first v) (map first f2)])))) [] f1)
)
will get you,
[["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With