Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Docker Alpine wkhtmltopdf Chinese / Thai Characters Display Incorrectly

We're working to convert a PHP docker image from Ubuntu to Alpine to reduce the image size, remove unnecessary dependencies and decrease built time. Due to the version of PHP we need to support, we can only use Alpine 3.10 for the moment.

One of the tools in the application uses is wkhtmltopdf to convert HTML files to PDFs. This works great for common English characters but seems to struggle with other characters such as Chinese or Thai.

To reproduce using the below Dockerfile and test.html:

------- Dockerfile -------

FROM alpine:3.10

RUN apk update && apk --no-cache add \
        git libcurl wget \
        curl tzdata procps vim \
        python3 py3-pip \
        zip unzip \
        libsasl \
        openssl \
        libpng \
        libjpeg \
        libjpeg-turbo \
        freetype \
        libxml2 \
        fontconfig \
        icu libzip \
        wkhtmltopdf \
        libgcc libstdc++ libx11 glib libxrender libxext libintl \
        font-noto-arabic terminus-font ttf-inconsolata ttf-dejavu font-noto font-noto-extra \
        ttf-dejavu ttf-droid ttf-freefont ttf-liberation ttf-ubuntu-font-family \
        libpng-dev libjpeg-turbo-dev freetype-dev libxml2-dev icu-dev autoconf gcc g++ make libzip-dev \
    && rm -rf /var/cache/apt/* && rm /var/cache/apk/*

COPY ./test.html ./

------- test.html -------
<html>

<body>
    <p>English</p>
    <p>電子郵件</p>
</body>

</html>

$ docker build -t character_test . 
$ docker run --name character_test character_test wkhtmltopdf ./test.html ./test.pdf
$ docker cp character_test:./test.pdf ./test.pdf
$ docker rm character_test
$ docker rmi character_test

Now if you open the PDF, you can see something like the below which does not match the characters in the html file.

PDF Output

As you can see from the Dockerfile, I'm fairly sure we've installed just about every known font for Alpine in an attempt to resolve this but we're not really sure of the problem or how to resolve.

What is causing these characters to display incorrectly and how can we resolve it in our image?

like image 442
Alex Bailey Avatar asked Dec 21 '25 20:12

Alex Bailey


1 Answers

I did not optimize the instructions in the Dockerfile, just to quickly conduct a POC to verify the feasibility of certain concepts.

Project Directory

docker_wkhtml2pdf
├── Dockerfile   (1)  
├── simsun.ttc   (2) 
└── data
    └── test3.html (3)

Dockerfile (1)

There are two main problems with PDF display. (1) One is the encoding problem, so I added the locale-related installation and settings (2) The other is the font, I added the SimSun.ttc

FROM alpine:3.12

ENV LANG=en_US.UTF-8 \
    LANGUAGE=en_US:en \
    LC_ALL=en_US.UTF-8

RUN mkdir -p /usr/share/fonts/chinese/TrueType
COPY simsun.ttc /usr/share/fonts/chinese/TrueType/

RUN apk update

RUN apk add --no-cache \
    bash \
    libc6-compat \
    musl-locales \
    musl-locales-lang

RUN apk --no-cache add \
        git libcurl wget \
        curl tzdata procps vim \
        python3 py3-pip \
        zip unzip \
        libsasl \
        openssl \
        libpng \
        libjpeg \
        libjpeg-turbo \
        freetype \
        libxml2 \
        fontconfig \
        icu libzip \
        wkhtmltopdf \
        libgcc libstdc++ libx11 glib libxrender libxext libintl \
        font-noto-arabic terminus-font ttf-inconsolata ttf-dejavu font-noto font-noto-extra \
        ttf-dejavu ttf-droid ttf-freefont ttf-liberation ttf-ubuntu-font-family \
        libpng-dev libjpeg-turbo-dev freetype-dev libxml2-dev icu-dev autoconf gcc g++ make libzip-dev


RUN rm -rf /var/cache/apt/*
RUN rm /var/cache/apk/*

RUN echo "export LANG=en_US.UTF-8" >> /etc/profile

WORKDIR /documents
VOLUME /documents

# COPY ./test.html ./
# COPY ./test2.html ./

test3.html (3)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
    <p>English</p>
    <p>電子郵件</p>
    <p>สวัสดี</p>
</body>
</html>

simsun.ttc (2)

COPY simsun.ttc to project directory

Build image

docker build -t character_test . 

Run

docker run --rm  \
       --user 1000:1000 \
       -v `pwd`/data:/documents/ \
       character_test \
       wkhtmltopdf test3.html test3.pdf

Output

docker_wkhtml2pdf
├── Dockerfile
├── simsun.ttc
└── data
    ├── test3.html
    └── test3.pdf    (4) Output files

Check test3.pdf (4)

Successfully displayed Chinese text string.

enter image description here

Viewing the PDF properties, you can see that a SimSun font is embedded, which is a font used for Chinese.

like image 109
life888888 Avatar answered Dec 23 '25 23:12

life888888



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!