Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you echo a 4-digit Unicode character in Bash?

In UTF-8 it's actually 6 digits (or 3 bytes).

$ printf '\xE2\x98\xA0'
ā˜ 

To check how it's encoded by the console, use hexdump:

$ printf ā˜  | hexdump
0000000 98e2 00a0                              
0000003

% echo -e '\u2620'     # \u takes four hexadecimal digits
ā˜ 
% echo -e '\U0001f602' # \U takes eight hexadecimal digits
šŸ˜‚

This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.


So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.

For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?

At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:

echo CTRL+SHIFT+U2620ENTERENTER

(The first enter ends Unicode input, and the second runs the echo command.)

Credit: Ask Ubuntu SE


Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.

fast_chr() {
    local __octal
    local __char
    printf -v __octal '%03o' $1
    printf -v __char \\$__octal
    REPLY=$__char
}

function unichr {
    local c=$1    # Ordinal of char
    local l=0    # Byte ctr
    local o=63    # Ceiling
    local p=128    # Accum. bits
    local s=''    # Output string

    (( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }

    while (( c > o )); do
        fast_chr $(( t = 0x80 | c & 0x3f ))
        s="$REPLY$s"
        (( c >>= 6, l++, p += o+1, o>>=1 ))
    done

    fast_chr $(( t = p | c ))
    echo -n "$REPLY$s"
}

## test harness
for (( i=0x2500; i<0x2600; i++ )); do
    unichr $i
done

Output was:

ā”€ā”ā”‚ā”ƒā”„ā”…ā”†ā”‡ā”ˆā”‰ā”Šā”‹ā”Œā”ā”Žā”
ā”ā”‘ā”’ā”“ā””ā”•ā”–ā”—ā”˜ā”™ā”šā”›ā”œā”ā”žā”Ÿ
ā” ā””ā”¢ā”£ā”¤ā”„ā”¦ā”§ā”Øā”©ā”Ŗā”«ā”¬ā”­ā”®ā”Æ
ā”°ā”±ā”²ā”³ā”“ā”µā”¶ā”·ā”øā”¹ā”ŗā”»ā”¼ā”½ā”¾ā”æ
ā•€ā•ā•‚ā•ƒā•„ā•…ā•†ā•‡ā•ˆā•‰ā•Šā•‹ā•Œā•ā•Žā•
ā•ā•‘ā•’ā•“ā•”ā••ā•–ā•—ā•˜ā•™ā•šā•›ā•œā•ā•žā•Ÿ
ā• ā•”ā•¢ā•£ā•¤ā•„ā•¦ā•§ā•Øā•©ā•Ŗā•«ā•¬ā•­ā•®ā•Æ
ā•°ā•±ā•²ā•³ā•“ā•µā•¶ā•·ā•øā•¹ā•ŗā•»ā•¼ā•½ā•¾ā•æ
ā–€ā–ā–‚ā–ƒā–„ā–…ā–†ā–‡ā–ˆā–‰ā–Šā–‹ā–Œā–ā–Žā–
ā–ā–‘ā–’ā–“ā–”ā–•ā––ā–—ā–˜ā–™ā–šā–›ā–œā–ā–žā–Ÿ
ā– ā–”ā–¢ā–£ā–¤ā–„ā–¦ā–§ā–Øā–©ā–Ŗā–«ā–¬ā–­ā–®ā–Æ
ā–°ā–±ā–²ā–³ā–“ā–µā–¶ā–·ā–øā–¹ā–ŗā–»ā–¼ā–½ā–¾ā–æ
ā—€ā—ā—‚ā—ƒā—„ā—…ā—†ā—‡ā—ˆā—‰ā—Šā—‹ā—Œā—ā—Žā—
ā—ā—‘ā—’ā—“ā—”ā—•ā—–ā——ā—˜ā—™ā—šā—›ā—œā—ā—žā—Ÿ
ā— ā—”ā—¢ā—£ā—¤ā—„ā—¦ā—§ā—Øā—©ā—Ŗā—«ā—¬ā—­ā—®ā—Æ
ā—°ā—±ā—²ā—³ā—“ā—µā—¶ā—·ā—øā—¹ā—ŗā—»ā—¼ā—½ā—¾ā—æ

Quick one-liner to convert UTF-8 characters into their 3-byte format:

var="$(echo -n 'ā˜ ' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo

or

echo -n 'ā˜ ' | od -An -tx1 | sed 's/ /\\x/g'  

The output of both is \xE2\x98\xA0, so you can write reversely:

echo $'\xe2\x98\xa0'   # ā˜