Writing idiomatic functional code, in Clojure[1], how one would write a function that splits a string by whitespace but keeps quoted phrases intact? A quick solution is of course to use regular expressions but this should be possible without them. At a quick glance it seems pretty hard! I've written a similar in imperative languages but I'd like to see how a functional, recursive approach works.
A quick checkout of what our function should do:
"Hello there!" -> ["Hello", "there!"]
"'A quoted phrase'" -> ["A quoted phrase"]
"'a' 'b' c d" -> ["a", "b", "c", "d"]
"'a b' 'c d'" -> ["a b", "c d"]
"Mid'dle 'quotes do not concern me'" -> ["Mid'dle", "quotes do not concern me"]
I don't mind if the spacing changes between the quotes (so that one can use simple splitting by whitespace first).
"'lots of spacing' there" -> ["lots of spacing", "there"] ;is ok to me
[1] This question could be answered in general level but I guess that a functional approach in Clojure can be translated to Haskell, ML, etc with ease.
Here's a version returning a lazy seq of words / quoted strings:
(defn splitter [s]
(lazy-seq
(when-let [c (first s)]
(cond
(Character/isSpace c)
(splitter (rest s))
(= \' c)
(let [[w* r*] (split-with #(not= \' %) (rest s))]
(if (= \' (first r*))
(cons (apply str w*) (splitter (rest r*)))
(cons (apply str w*) nil)))
:else
(let [[w r] (split-with #(not (Character/isSpace %)) s)]
(cons (apply str w) (splitter r)))))))
A test run:
user> (doseq [x ["Hello there!"
"'A quoted phrase'"
"'a' 'b' c d"
"'a b' 'c d'"
"Mid'dle 'quotes do not concern me'"
"'lots of spacing' there"]]
(prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots of spacing" "there")
nil
If single quotes in the input don't match up properly, everything from the final opening single quote is taken to constitute one "word":
user> (splitter "'asdf")
("asdf")
Update: Another version in answer to edbond's comment, with better handling of quote characters inside words:
(defn splitter [s]
((fn step [xys]
(lazy-seq
(when-let [c (ffirst xys)]
(cond
(Character/isSpace c)
(step (rest xys))
(= \' c)
(let [[w* r*]
(split-with (fn [[x y]]
(or (not= \' x)
(not (or (nil? y)
(Character/isSpace y)))))
(rest xys))]
(if (= \' (ffirst r*))
(cons (apply str (map first w*)) (step (rest r*)))
(cons (apply str (map first w*)) nil)))
:else
(let [[w r] (split-with (fn [[x y]] (not (Character/isSpace x))) xys)]
(cons (apply str (map first w)) (step r)))))))
(partition 2 1 (lazy-cat s [nil]))))
A test run:
user> (doseq [x ["Hello there!"
"'A quoted phrase'"
"'a' 'b' c d"
"'a b' 'c d'"
"Mid'dle 'quotes do not concern me'"
"'lots of spacing' there"
"Mid'dle 'quotes do no't concern me'"
"'asdf"]]
(prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots of spacing" "there")
("Mid'dle" "quotes do no't concern me")
("asdf")
nil
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With