How do you echo a 4-digit Unicode character in Bash?

Question

In UTF-8 it's actually 6 digits (or 3 bytes).

$ printf '\xE2\x98\xA0'
☠

To check how it's encoded by the console, use hexdump:

$ printf ☠ | hexdump
0000000 98e2 00a0                              
0000003

% echo -e '\u2620'     # \u takes four hexadecimal digits
☠
% echo -e '\U0001f602' # \U takes eight hexadecimal digits
😂

This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.

So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.

For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?

At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:

echo CTRL+SHIFT+U2620ENTERENTER

(The first enter ends Unicode input, and the second runs the echo command.)

Credit: Ask Ubuntu SE

Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.

fast_chr() {
    local __octal
    local __char
    printf -v __octal '%03o' $1
    printf -v __char \$__octal
    REPLY=$__char
}

function unichr {
    local c=$1    # Ordinal of char
    local l=0    # Byte ctr
    local o=63    # Ceiling
    local p=128    # Accum. bits
    local s=''    # Output string

    (( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }

    while (( c > o )); do
        fast_chr $(( t = 0x80 | c & 0x3f ))
        s="$REPLY$s"
        (( c >>= 6, l++, p += o+1, o>>=1 ))
    done

    fast_chr $(( t = p | c ))
    echo -n "$REPLY$s"
}

## test harness
for (( i=0x2500; i<0x2600; i++ )); do
    unichr $i
done

Output was:

─━│┃┄┅┆┇┈┉┊┋┌┍┎┏
┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯
┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏
═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯
╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏
▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯
▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●
◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯
◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿

Quick one-liner to convert UTF-8 characters into their 3-byte format:

var="$(echo -n '☠' | od -An -tx1)"; printf '\x%s' ${var^^}; echo

or

echo -n '☠' | od -An -tx1 | sed 's/ /\x/g'

The output of both is \xE2\x98\xA0, so you can write reversely:

echo $'\xe2\x98\xa0'   # ☠

How do you echo a 4-digit Unicode character in Bash?

Tags:

bash

shell

character-encoding

unicode

Recent Activity

Donate For Us

How do you echo a 4-digit Unicode character in Bash?

Tags:

bash

shell

character-encoding

unicode

Related questions

Recent Activity

Donate For Us