Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode (utf-8) with git-bash

I'm having some trouble getting unicode to work for git-bash (on windows 7). I have tried many things without success. Although, I'm not quite sure what is responsible to for this so i might be working in the wrong direction.

It really seems this should be possible as the encoding for cmd.exe can be changed to unicode with 'chcp 65001'.

Here are some things I've tried (besides the obvious of looking through the configuration options in the GUI).

  1. Setting environment variables in '.bashrc'. I guess it makes sense this doesn't work since i think it's a linux thing. The 'locale' command does not exist.

    export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 export LANGUAGE=en_US.UTF-8 
  2. Starting out in cmd.exe, changing the encoding to unicode with 'chcp 65001' and then starting up git-bash. This causes me to get a permission denied when trying to cat my unicode test file. However, catting a file without unicode works just fine. As demonstrated, dropping back out to cmd.exe i can still "cat" the file. Using my default encoding (437) i can cat the file in bash (no permission denied but the output is fudged).

    S:\>chcp 65001 Active code page: 65001 S:\>"C:\Program Files (x86)\Git\bin\sh.exe" --login -i zarac@TOWELIE /z cat /s/unicode.txt cat: write error: Permission denied zarac@TOWELIE /z cat /s/nounicode.txt abc zarac@TOWELIE /z L /s/unicode.txt -rw-r--r--    1 zarac    Administ        7 May 18 10:30 /s/unicode.txt zarac@TOWELIE /z whoami towelie\zarac zarac@TOWELIE /z exit Z:\>type S:\unicode.txt abc£ 
  3. Using the /U flag when starting the shell (makes sense that it doesn't work because it's not quite what it's for if-i-understand-correctly, but it has to do with unicode so i tried it).

    C:\Windows\SysWOW64\cmd.exe /U /C "C:\Program Files (x86)\Git\bin\sh.exe" --login -i 
  4. As I prefer to use Console2, I've tried adding a dword value named CodePage with the value 65001 (decimal) to the windows registry under [HKEY_CURRENT_USER\Console] as well as [HKEY_CURRENT_USER\Console\Git Bash]. This seems to have the same effect as setting 'chcp 65001' accept that it's "automatic". (http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters)

  5. JPSoft's TCC/LE

  6. PowerCMD

  7. stackoverflow

  8. duckduckgo

  9. ixquick / google

So, method 2 seems viable if that permission issue can be fixed. However, I'm open to pretty much any solution although i prefer if i can use Console2 (due mostly to it's nifty tab feature). Perhaps one solution would be to setup an SSH server and then use Putty/Kitty to connect to it, but that's just wrong! ; )

PS. Is there any official documentation for git-bash?

like image 389
Hannes Avatar asked May 18 '12 11:05

Hannes


People also ask

Are Unicode and UTF-8 the same?

The Difference Between Unicode and UTF-8Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points).

How do I change my encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.

Is UTF-8 ASCII or Unicode?

UTF-8 encodes Unicode characters into a sequence of 8-bit bytes. The standard has a capacity for over a million distinct codepoints and is a superset of all characters in widespread use today. By comparison, ASCII (American Standard Code for Information Interchange) includes 128 character codes.

Is UTF-8 part of Unicode?

UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary. It also does the reverse, reading in binary digits and converting them back to characters.


2 Answers

I faced the same issue in MSYS Git 2.8.0 and as it turned out it just needed changing the configuration.

$ git --version  git version 2.8.0.windows.1 

The default configuration of Git Bash console in my system did not show Greek filenames.

$cd ~  $ls  AppData/ 'Application Data'@ Contacts/ Cookies@ Desktop/ Documents/ Downloads/ Favorites/ Links/ 'Local Settings'@ NTUSER.DAT . . . ''$'\316\244\316\261'' '$'\316\255\316\263\316\263\317\201\316\261\317\206\316\254'' '$'\316\274\316\277\317\205'@ 

The last line should display "Τα έγγραφά μου", the greek translation of "My Documents". In order to fix it I followed the below steps:

  1. Check your existing locale configuration

    $locale  LANG=en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= 

    As shown above, in my case it was not UTF-8

  2. Change the locale to a UTF-8 encoding. Click the icon on the left side of MINGW title bar, select "Options" and in the "Text" category choose "UTF-8" Character set. You should also choose a unicode font, such as the default "Lucida Console". My configuration looks as following: MinGW locale configuration

  3. Change the language for the current window (no need to do this on future windows, as they will be created with the settings of step 2)

     $ LANG='C.UTF-8' 
  4. The ls command should now display properly

    AppData/ 'Application Data'@ Contacts/ Cookies@ Desktop/ Documents/ Downloads/ Favorites/ Links/ 'Local Settings'@ NTUSER.DAT . . . 'Τα έγγραφά μου'@ 
like image 189
nkatsar Avatar answered Sep 23 '22 00:09

nkatsar


Found this answer elsewhere:

chcp.com 65001

Git bash chcp windows7 encoding issue

That's what actually solved it for me.

like image 29
TravisChambers Avatar answered Sep 23 '22 00:09

TravisChambers