Created a repo, added UTF8 and Latin2 encoded files with this content:
árvíztűrő tükörfúrógép
ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP
See on https://github.com/bimlas/git-test/commit/872370caf91f1faaf931c1228c797f3d10d6435d
The output of git log -p 82904e60
is:
commit 82904e60d1940c036c8190e2a41de6b423727a7c
Author: BimbaLaszlo <[email protected]>
Date: Mon Jul 27 14:38:35 2015 +0200
initial commit
diff --git a/fileencoding/latin2.txt b/fileencoding/latin2.txt
new file mode 100644
index 0000000..7165bc9
--- /dev/null
+++ b/fileencoding/latin2.txt
@@ -0,0 +1,2 @@
+<E1>rv<ED>zt<FB>r<F5> t<FC>k<F6>rf<FA>r<F3>g<E9>p^M
+<C1>RV<CD>ZT<DB>R<D5> T<DC>K<D6>RF<DA>R<D3>G<C9>P^M
diff --git a/fileencoding/utf8.txt b/fileencoding/utf8.txt
new file mode 100644
index 0000000..80e1878
--- /dev/null
+++ b/fileencoding/utf8.txt
@@ -0,0 +1,2 @@
+árvíztűrő tükörfúrógép^M
+ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP^M
I've git the same output on Linux and Windows (where my locale is Latin2). Tried without pager (git --no-pager log -p 82904e60
), got the same results without escape codes:
commit 82904e6
Author: BimbaLaszlo <[email protected]>
Date: 2015-07-27 14:38:35 +0200
initial commit
diff --git a/fileencoding/latin2.txt b/fileencoding/latin2.txt
new file mode 100644
index 0000000..7165bc9
--- /dev/null
+++ b/fileencoding/latin2.txt
@@ -0,0 +1,2 @@
+�rv�zt�r� t�k�rf�r�g�p
+�RV�ZT�R� T�K�RF�R�G�P
diff --git a/fileencoding/utf8.txt b/fileencoding/utf8.txt
new file mode 100644
index 0000000..80e1878
--- /dev/null
+++ b/fileencoding/utf8.txt
@@ -0,0 +1,2 @@
+árvíztűrő tükörfúrógép
+ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP
The log of the latin2.txt is the same, so the problem is not caused by mix of differently encoded files in one output.
How can I set up Git to print the characters as they should appear even without pager?
EDIT
I think the problem is not related to the terminal, for example on Windows PowerShell the latin2.txt is fine, but utf8.txt is weird:
Git does not really care about character encodings at all. A file is just a bunch of bytes.
Displaying is done by your terminal. If it is configured to decode as UTF-8 your latin-2 file seems broken. If it is configured to decode as latin-2 you UTF-8 file seems broken.
Maybe the encoding
attribute (see git help gitattributes
) is able to give some tools a hint how to decode a file correctly, but I never used this.
For example github might be smart enough to look at this attribute and decode those files differently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With