In UTF-8 it's actually 6 digits (or 3 bytes).
$ printf '\xE2\x98\xA0'
ā
To check how it's encoded by the console, use hexdump:
$ printf ā | hexdump
0000000 98e2 00a0
0000003
% echo -e '\u2620' # \u takes four hexadecimal digits
ā
% echo -e '\U0001f602' # \U takes eight hexadecimal digits
š
This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.
So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.
For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?
At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u
. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:
echo CTRL+SHIFT+U2620ENTERENTER
(The first enter ends Unicode input, and the second runs the echo
command.)
Credit: Ask Ubuntu SE
Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.
fast_chr() {
local __octal
local __char
printf -v __octal '%03o' $1
printf -v __char \\$__octal
REPLY=$__char
}
function unichr {
local c=$1 # Ordinal of char
local l=0 # Byte ctr
local o=63 # Ceiling
local p=128 # Accum. bits
local s='' # Output string
(( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }
while (( c > o )); do
fast_chr $(( t = 0x80 | c & 0x3f ))
s="$REPLY$s"
(( c >>= 6, l++, p += o+1, o>>=1 ))
done
fast_chr $(( t = p | c ))
echo -n "$REPLY$s"
}
## test harness
for (( i=0x2500; i<0x2600; i++ )); do
unichr $i
done
Output was:
āāāāāā
āāāāāāāāāā
āāāāāāāāāāāāāāāā
ā ā”ā¢ā£ā¤ā„ā¦ā§āØā©āŖā«ā¬āā®āÆ
ā°ā±ā²ā³ā“āµā¶ā·āøā¹āŗā»ā¼ā½ā¾āæ
āāāāāā
āāāāāāāāāā
āāāāāāāāāāāāāāāā
ā ā”ā¢ā£ā¤ā„ā¦ā§āØā©āŖā«ā¬āā®āÆ
ā°ā±ā²ā³ā“āµā¶ā·āøā¹āŗā»ā¼ā½ā¾āæ
āāāāāā
āāāāāāāāāā
āāāāāāāāāāāāāāāā
ā ā”ā¢ā£ā¤ā„ā¦ā§āØā©āŖā«ā¬āā®āÆ
ā°ā±ā²ā³ā“āµā¶ā·āøā¹āŗā»ā¼ā½ā¾āæ
āāāāāā
āāāāāāāāāā
āāāāāāāāāāāāāāāā
ā ā”ā¢ā£ā¤ā„ā¦ā§āØā©āŖā«ā¬āā®āÆ
ā°ā±ā²ā³ā“āµā¶ā·āøā¹āŗā»ā¼ā½ā¾āæ
Quick one-liner to convert UTF-8 characters into their 3-byte format:
var="$(echo -n 'ā ' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo
or
echo -n 'ā ' | od -An -tx1 | sed 's/ /\\x/g'
The output of both is \xE2\x98\xA0
, so you can write reversely:
echo $'\xe2\x98\xa0' # ā
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With