Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?

Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn, putChar which looks fine and readable)--the examples show GHCi and Hugs98:

$ ghci GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help Prelude> "hello: привет" "hello: \1087\1088\1080\1074\1077\1090" Prelude> 'Я' '\1071' Prelude> putStrLn "hello: привет" hello: привет Prelude> :q Leaving GHCi. $ hugs -98 __   __ __  __  ____   ___      _________________________________________ ||   || ||  || ||  || ||__      Hugs 98: Based on the Haskell 98 standard ||___|| ||__|| ||__||  __||     Copyright (c) 1994-2005 ||---||         ___||           World Wide Web: http://haskell.org/hugs ||   ||                         Bugs: http://hackage.haskell.org/trac/hugs ||   || Version: September 2006 _________________________________________  Hugs mode: Restart with command line option +98 for Haskell 98 mode  Type :? for help Hugs> "hello: привет" "hello: \1087\1088\1080\1074\1077\1090" Hugs> 'Я' '\1071' Hugs> putStrLn "hello: привет" hello: привет  Hugs> :q [Leaving Hugs] $ locale LANG=ru_RU.UTF-8 LC_CTYPE="ru_RU.UTF-8" LC_NUMERIC="ru_RU.UTF-8" LC_TIME="ru_RU.UTF-8" LC_COLLATE="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES="ru_RU.UTF-8" LC_PAPER="ru_RU.UTF-8" LC_NAME="ru_RU.UTF-8" LC_ADDRESS="ru_RU.UTF-8" LC_TELEPHONE="ru_RU.UTF-8" LC_MEASUREMENT="ru_RU.UTF-8" LC_IDENTIFICATION="ru_RU.UTF-8" LC_ALL= $  

We can guess that it's because print and show are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):

$ ghci GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help Prelude> show 'Я' "'\\1071'" Prelude> :q Leaving GHCi. $ hugs -98 Type :? for help Hugs> show 'Я' "'\\1071'" Hugs> :q [Leaving Hugs] $  

But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.

Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.

To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar, putStrLn, i.e. unescaped.)

(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)

like image 705
imz -- Ivan Zakharyaschev Avatar asked Apr 04 '11 07:04

imz -- Ivan Zakharyaschev


1 Answers

One way to hack this is to wrap GHCi into a shell wrapper that reads its stdout and unescapes Unicode characters. This is not the Haskell way of course, but it does the job :)

For example, this is a wrapper ghci-esc that uses sh and python3 (3 is important here):

#!/bin/sh  ghci "$@" | python3 -c ' import sys import re  def tr(match):     s = match.group(1)     try:         return chr(int(s))     except ValueError:         return s  for line in sys.stdin:     sys.stdout.write(re.sub(r"\\([0-9]{4})", tr, line)) ' 

Usage of ghci-esc:

$ ./ghci-esc GHCi, version 7.0.2: http://www.haskell.org/ghc/  :? for help > "hello" "hello" > "привет" "привет" > 'Я' 'Я' > show 'Я' "'\Я'" > :q Leaving GHCi. 

Note that not all unescaping above is done correctly, but this is a fast way to show Unicode output to your audience.

like image 53
Andrey Vlasovskikh Avatar answered Sep 22 '22 08:09

Andrey Vlasovskikh