Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an elegant way to check if a character is an ASCII letter (a-Z) in Scala?

Tags:

ascii

scala

I am currently working with Scanners and Parsers and need a Parser that accepts characters that are ASCII letters - so I can't use char.isLetter.

I came up with two solutions myself. I don't like both of them.

Regex

def letter = elem("ascii letter", _.toString.matches("""[a-zA-Z]"""))

This seems rather "overkill" to check such a simple thing with a regex.

Range check

def letter = elem("ascii letter", c => ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z'))

In my opinion, this would be the way to go in Java. But it's not really readable.

Is there a cleaner, more Scala-like solution to this problem? I do not really worry about performance, as it doesn't matter in this case.

like image 771
r0estir0bbe Avatar asked Mar 15 '13 18:03

r0estir0bbe


2 Answers

You say you can't use Char.isLetter because you only want ASCII letters. Why not just restrict it to the 7-bit ASCII character range?

def isAsciiLetter(c: Char) = c.isLetter && c <= 'z'

If the reader wants to check for ASCII including non-letters then:

def isAscii(c: Char) = c.toInt <= 127
like image 94
DaoWen Avatar answered Oct 19 '22 18:10

DaoWen


Regardless of what you choose in the end, I suggest abstracting out the definition of "is an ASCII letter" for readability and performance. E.g.:

object Program extends App {
  implicit class CharProperties(val ch: Char) extends AnyVal {
    def isASCIILetter: Boolean =
      (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')
  }
  println('x'.isASCIILetter)
  println('0'.isASCIILetter)
}

Or if you want to describe ASCII letters as a set:

object Program extends App {
  object CharProperties {
    val ASCIILetters = ('a' to 'z').toSet ++ ('A' to 'Z').toSet
  }
  implicit class CharProperties(val ch: Char) extends AnyVal {
    def isASCIILetter: Boolean =
      CharProperties.ASCIILetters.contains(ch)
  }
  println('x'.isASCIILetter)
  println('0'.isASCIILetter)
}

Once you're using an explicit function with an understandable name, your intent should be clear either way and you can choose the implementation with the better performance (though any performance differences between the two versions above should be rather minimal).

like image 20
Reimer Behrends Avatar answered Oct 19 '22 17:10

Reimer Behrends