Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Unicode in CHtmlEditCtrl::SetDocumentHTML

Tags:

c++

mfc

How do I use CHtmlEditCtrl::SetDocumentHTML to display Unicode correctly (either UTF-16 or UTF-8 input)

Program is compiled in Unicode.

For example, given the following input with charset=utf-8 meta tag:

CString u16 = LR"(<!DOCTYPE><html>
    <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
    <body>ελληνικά 华语 😃</body></html>)";

m_htmledit.SetDocumentHTML(u16) doesn't show the characters correctly.

Instead, I have to call m_htmledit.SetDocumentHTML(CA2W(CW2A(u16, CP_UTF8), CP_ACP));

I can't figure out why it works like that, or if it does work on all systems.

Minimum example:

#include "afxhtml.h"
...
CHtmlEditCtrl m_htmledit;
...
BOOL CMyDialog::OnInitDialog()
{
    CDialogEx::OnInitDialog();
    m_htmledit.Create(0, 0, CRect(10, 10, 300, 300), this, 0, 0);

    //wait for the control, this is not directly related to the question
    CComPtr<IHTMLDocument2> document;
    if(m_htmledit.GetDHtmlDocument(&document))
    {
        CComBSTR ready;
        while(document->get_readyState(&ready) == S_OK)
            if(wcscmp(ready, L"complete") == 0 || !AfxPumpMessage())
                break;
    }

    //send html data:
    CString utf16 = LR"(<!DOCTYPE><html>
        <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
        <body>ελληνικά 华语 😃</body></html>)";

    //m_htmledit.SetDocumentHTML(utf16); <- outputs garbage characters
    m_htmledit.SetDocumentHTML(CA2W(CW2A(utf16, CP_UTF8), CP_ACP)); //<- correct output
    return TRUE;
}

There is similar issue with UTF-8 input.

m_htmledit_ctrl.SetDocumentHTML(CA2W(utf8, CP_UTF8)); doesn't show the characters correctly.

m_htmledit_ctrl.SetDocumentHTML(CA2W(utf8, CP_ACP)); does work. But using CP_ACP here is odd.

Example:

CStringA utf8 = u8R"(<!DOCTYPE><html>
    <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
    <body>ελληνικά 华语 😃</body></html>)";
m_htmledit.SetDocumentHTML(CA2W(utf8, CP_ACP)); //<= correct output
like image 336
Barmak Shemirani Avatar asked Sep 12 '25 10:09

Barmak Shemirani


2 Answers

CHtmlEditCtrl::SetDocumentHTML uses a class called CStreamOnCString.

CStreamOnCString at some point calls

m_strAnsi = m_strStream;

Where m_strAnsi is a storage buffer, and m_strStream is CStringW source. I believe this is an error, because it doesn't copy source to the buffer. Rather it converts it with CW2A(m_strStream, CP_ACP)

This error can be corrected with another CP_ACP conversion prior to sending data.

Alternatively we can write our own function as follows:

class CMyHtmlEditCtrl : public CHtmlEditCtrl
{
    public:
    template <class Type>
    HRESULT SetDocumentHTML_unicode(CStringT<Type, StrTraitMFC<Type>> html)
    {
        HRESULT hr = E_NOINTERFACE;
        CComPtr<IHTMLDocument2> document;
        if(!GetDHtmlDocument(&document))
            return hr;
        IStream *istream = SHCreateMemStream(
          reinterpret_cast<const BYTE*>(html.GetBuffer()), sizeof(Type)*html.GetLength());
        if(istream)
        {
            //CComQIPtr<IPersistStreamInit> psi = document; 
            CComQIPtr<IPersistStreamInit> psi { document }; //c++20 compliant
            if(psi)
                hr = psi->Load(istream);
            istream->Release();
        }
        html.ReleaseBuffer();
        return hr;
    }
};

Now we can call SetDocumentHTML_unicode(utf8_string) or SetDocumentHTML_unicode(utf16_string)

like image 121
Barmak Shemirani Avatar answered Sep 15 '25 01:09

Barmak Shemirani


The accepted answer does not compile in c++20. Here is a version that does (however only for CStringW or CStringA, depending on UNICODE define):

HRESULT CMyHtmlEditCtrl::SetDocumentHTML_unicode(CString html)
{
    HRESULT hr = E_NOINTERFACE;
    CStreamOnCString stream(html);
    ::ATL::CComPtr<IHTMLDocument2> spHTMLDocument;
    ::ATL::CComQIPtr<IPersistStreamInit> spPSI;

    if(!GetDHtmlDocument(&spHTMLDocument))
        return hr;
    IStream *istream = SHCreateMemStream(
        reinterpret_cast<const BYTE*>((LPCTSTR)html), sizeof(TCHAR)*html.GetLength());
    if(istream)
    {
        spPSI = spHTMLDocument;
        if(spPSI)
            hr = spPSI->Load(istream);
        istream->Release();
    }
    return hr;
}
like image 40
Bojan Hrnkas Avatar answered Sep 15 '25 00:09

Bojan Hrnkas