How do I get the set of all letters in Java/Clojure?

Question

In Python, I can do this:

>>> import string
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

Is there any way to do something similar in Clojure (apart from copying and pasting the above characters somewhere)? I looked through both the Clojure standard library and the java standard library and couldn't find it.

Hamza Yerlikaya · Accepted Answer

If you just want Ascii chars,

(map char (concat (range 65 91) (range 97 123)))

will yield,

(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z 
 \a \b \c \d \e \f \g \h \i \j \k \l \m 
 \o \p \q 
 \s 	 \u \v \w \x \y \z)

Michael Borgwardt · Answer

A properly non-ASCII-centric implementation:

private static String allLetters(String charsetName)
{
    CharsetEncoder ce = Charset.forName(charsetName).newEncoder();
    StringBuilder result = new StringBuilder();
    for(char c=0; c<Character.MAX_VALUE; c++)
    {
        if(ce.canEncode(c) && Character.isLetter(c))
        {
            result.append(c);
        }
    }
    return result.toString();
}

Call this with "US-ASCII" and you'll get the desired result (except that uppercase letters come first). You could call it with Charset.defaultCharset(), but I suspect that you'd get far more than the ASCII letters on most systems, even in the USA.

Caveat: only considers the basic multilingual plane. Wouldn't be too hard to extend to the supplementary planes, but it would take a lot longer, and the utility is questionable.

Jürgen Hötzel · Answer

Based on Michaels imperative Java solution, this is a idiomatic (lazy sequences) Clojure solution:

(ns stackoverflow
  (:import (java.nio.charset Charset CharsetEncoder)))

(defn all-letters [charset]
  (let [encoder (. (Charset/forName charset) newEncoder)]
    (letfn [(valid-char? [c]
             (and (.canEncode encoder (char c)) (Character/isLetter c)))
        (all-letters-lazy [c]
                  (when (<= c (int Character/MAX_VALUE))
                (if (valid-char? c)
                  (lazy-seq
                   (cons (char c) (all-letters-lazy (inc c))))
                  (recur (inc c)))))]
      (all-letters-lazy 0))))

Update: Thanks cgrand for this preferable high-level solution:

(defn letters [charset-name]
  (let [ce (-> charset-name java.nio.charset.Charset/forName .newEncoder)]
    (->> (range 0 (int Character/MAX_VALUE)) (map char)
         (filter #(and (.canEncode ce %) (Character/isLetter %))))))

But the performace comparison between my first approach

user> (time (doall (stackoverflow/all-letters "ascii"))) 
"Elapsed time: 33.333336 msecs"                                                  
(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z \
a \b \c \d \e \f \g \h \i \j \k \l \m 
 \o \p \q 
 \s 	 \u \v \w \x \y \z)

and your solution

user> (time (doall (stackoverflow/letters "ascii"))) 
"Elapsed time: 666.666654 msecs"                                                 
(\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U \V \W \X \Y \Z \
a \b \c \d \e \f \g \h \i \j \k \l \m 
 \o \p \q 
 \s 	 \u \v \w \x \y \z)

is quite interesting.

How do I get the set of all letters in Java/Clojure?

Tags:

java

character

clojure

letters

Jason Baker

3 Answers

Hamza Yerlikaya

Michael Borgwardt

Jürgen Hötzel

Recent Activity

Donate For Us

How do I get the set of all letters in Java/Clojure?

Tags:

java

character

clojure

letters

Jason Baker

3 Answers

Hamza Yerlikaya

Michael Borgwardt

Jürgen Hötzel

Related questions

Recent Activity

Donate For Us