In Python, I can do this:
>>> import string
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Is there any way to do something similar in Clojure (apart from copying and pasting the above characters somewhere)? I looked through both the Clojure standard library and the java standard library and couldn't find it.
If you just want Ascii chars,
(map char (concat (range 65 91) (range 97 123)))
will yield,
(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z
\a \b \c \d \e \f \g \h \i \j \k \l \m \n \o \p \q \r \s \t \u \v \w \x \y \z)
A properly non-ASCII-centric implementation:
private static String allLetters(String charsetName)
{
CharsetEncoder ce = Charset.forName(charsetName).newEncoder();
StringBuilder result = new StringBuilder();
for(char c=0; c<Character.MAX_VALUE; c++)
{
if(ce.canEncode(c) && Character.isLetter(c))
{
result.append(c);
}
}
return result.toString();
}
Call this with "US-ASCII" and you'll get the desired result (except that uppercase letters come first). You could call it with Charset.defaultCharset()
, but I suspect that you'd get far more than the ASCII letters on most systems, even in the USA.
Caveat: only considers the basic multilingual plane. Wouldn't be too hard to extend to the supplementary planes, but it would take a lot longer, and the utility is questionable.
Based on Michaels imperative Java solution, this is a idiomatic (lazy sequences) Clojure solution:
(ns stackoverflow
(:import (java.nio.charset Charset CharsetEncoder)))
(defn all-letters [charset]
(let [encoder (. (Charset/forName charset) newEncoder)]
(letfn [(valid-char? [c]
(and (.canEncode encoder (char c)) (Character/isLetter c)))
(all-letters-lazy [c]
(when (<= c (int Character/MAX_VALUE))
(if (valid-char? c)
(lazy-seq
(cons (char c) (all-letters-lazy (inc c))))
(recur (inc c)))))]
(all-letters-lazy 0))))
Update: Thanks cgrand for this preferable high-level solution:
(defn letters [charset-name]
(let [ce (-> charset-name java.nio.charset.Charset/forName .newEncoder)]
(->> (range 0 (int Character/MAX_VALUE)) (map char)
(filter #(and (.canEncode ce %) (Character/isLetter %))))))
But the performace comparison between my first approach
user> (time (doall (stackoverflow/all-letters "ascii")))
"Elapsed time: 33.333336 msecs"
(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z \\
a \b \c \d \e \f \g \h \i \j \k \l \m \n \o \p \q \r \s \t \u \v \w \x \y \z)
and your solution
user> (time (doall (stackoverflow/letters "ascii")))
"Elapsed time: 666.666654 msecs"
(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z \\
a \b \c \d \e \f \g \h \i \j \k \l \m \n \o \p \q \r \s \t \u \v \w \x \y \z)
is quite interesting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With