Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Clojure unit test output

When unit testing some code that translates ascii sequences into unicode characters I have found a problem with the output of Clojure tests.

I have tested that my terminal can output unicode characters (by cat-ing the test files) and that works fine, so the problem seems related to leiningen, Clojure or clojure.test somehow.

Here's an example test (using the Greek section of unicode - I will also be using Greek extended but I assume the same problems will apply):

(deftest bc-string-w-comma
  (is (= "αβγ, ΑΒΓ" (parse "abg,*a*b*g"))))

It is meant to fail due to the missing space in the input. The output from lein test is the following:

Testing parse_perseus.test.betacode
FAIL in (bc-string-w-comma) (betacode.clj:15)
expected: (= "???, ???" (parse "abg,*a*b*g"))
  actual: (not (= "???, ???" "???,???"))
Testing parse_perseus.test.core
Testing parse_perseus.test.pluralise
Ran 10 tests containing 59 assertions.
1 failures, 0 errors.

What am I doing wrong here? Is this a terminal emulation problem or something clojure-related? I have the same problem running code in the REPL with Slime/swank/emacs. The REPL in emacs only outputs question marks for unicode output (although emacs is quite capable of understanding unicode).

I have tried running this in Terminal and iTerm (OS X) with the same results.

like image 482
William Roe Avatar asked Jan 30 '11 16:01

William Roe


2 Answers

It turns out that you can pass options to java to force the output encoding of *out* so that unicode works, like this:

java -Dfile.encoding=utf-8 -cp lib/clojure-1.2.0.jar:lib/clojure-contrib-1.2.0.jar clojure.main -i src/whatever.clj

As I'm using Leiningen, I added this property to my project.clj file:

(defproject project_name "1.0.0-SNAPSHOT"
  :description "A Clojure Project"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]]
  :dev-dependencies [[swank-clojure "1.2.0"]]
  :jvm-opts ["-Dfile.encoding=utf-8"])
like image 171
William Roe Avatar answered Nov 15 '22 09:11

William Roe


Clojure itself seems in the clear (this is Ubuntu 10.10, gnome-terminal, OpenJDK):

john@woc-desktop$ java -cp /home/john/.m2/repository/org/clojure/clojure/1.2.0/clojure-1.2.0.jar:/home/john/.m2/repository/org/clojure/clojure-contrib/1.2.0/clojure-contrib-1.2.0.jar clojure.main
Clojure 1.2.0
user=> (use 'clojure.test)
nil
user=> (defn parse [s] "αβγ,ΑΒΓ")
#'user/parse
user=> (deftest greek (is (= "αβγ, ΑΒΓ" (parse ""))))
#'user/greek
user=> (run-tests)

Testing user

FAIL in (greek) (NO_SOURCE_FILE:3)
expected: (= "αβγ, ΑΒΓ" (parse ""))
  actual: (not (= "αβγ, ΑΒΓ" "αβγ,ΑΒΓ"))

Ran 1 tests containing 1 assertions.
1 failures, 0 errors.
{:type :summary, :test 1, :pass 0, :fail 1, :error 0}
user=> 

But it does break emacs/swank/clojure-maven-plugin/maven

at REPL in emacs:

> (is "αβγ""αβγ")

slime-net-send: Coding system iso-latin-1-unix not suitable for "000052(:emacs-rex (swank:listener-eval \"(is \\\"αβγ\\\"\\\"αβγ\\\")

\") \"user\" :repl-thread 33)
"

If I use maven, the simple pom file below, and mvn clojure:repl then it's ok:

[INFO] [clojure:repl {execution: default-cli}]
Clojure 1.2.0
user=> (use 'clojure.test) (is "αβγ""αβγ")
nil
"αβγ"
user=> (defn parse [s] "αβγ,ΑΒΓ")
#'user/parse
user=> (deftest greek (is (= "αβγ, ΑΒΓ" (parse ""))))
#'user/greek
user=> (run-tests)

Testing user

FAIL in (greek) (NO_SOURCE_FILE:3)
expected: (= "αβγ, ΑΒΓ" (parse ""))
  actual: (not (= "αβγ, ΑΒΓ" "αβγ,ΑΒΓ"))

Ran 1 tests containing 1 assertions.
1 failures, 0 errors.
{:type :summary, :test 1, :pass 0, :fail 1, :error 0}
user=> 

but if I add the jline library using this snippet:

<dependency>
  <groupId>jline</groupId>
  <artifactId>jline</artifactId>
  <version>0.9.94</version>
</dependency>

then I get:

[INFO] [clojure:repl {execution: default-cli}]
[INFO] Enabling JLine support
Clojure 1.2.0
user=> (use 'clojure.test) (is "αβγ""αβγ")
nil
"���"
user=> (defn parse [s] "αβγ,ΑΒΓ")
#'user/parse
user=> (deftest greek (is (= "αβγ, ΑΒΓ" (parse ""))))
#'user/greek
user=> (run-tests)

Testing user

FAIL in (greek) (NO_SOURCE_FILE:3)
expected: (= "���, ���" (parse ""))
  actual: (not (= "���, ���" "���,���"))

Ran 1 tests containing 1 assertions.
1 failures, 0 errors.
{:type :summary, :test 1, :pass 0, :fail 1, :error 0}
user=> 

Which looks awfully like your error. So it may be that the problem is in jLine, or some other piece which Leiningen and maven have in common which is associated with jLine.

Or of course, there may be two independent unicode-related failures.

Here is my maven pom.xml file in case anyone is trying to debug this.

<project>

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.aspden</groupId>
  <artifactId>maven-clojure-simple</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>maven-clojure-simple</name>
  <description>maven, clojure: simple project</description>

  <repositories>

    <repository>
      <id>clojure</id>
      <url>http://build.clojure.org/releases</url>
    </repository>
    <repository>
      <id>central</id>
      <url>http://repo1.maven.org/maven2</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>org.clojure</groupId>
      <artifactId>clojure</artifactId>
      <version>1.2.0</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
    <groupId>com.theoryinpractise</groupId>
    <artifactId>clojure-maven-plugin</artifactId>
    <version>1.3.5-SNAPSHOT</version>
      </plugin>
    </plugins>
  </build>

</project>

I appreciate this is not an answer, but i thought it might be helpful.

like image 37
John Lawrence Aspden Avatar answered Nov 15 '22 07:11

John Lawrence Aspden