How do I use CHtmlEditCtrl::SetDocumentHTML
to display Unicode correctly (either UTF-16 or UTF-8 input)
Program is compiled in Unicode.
For example, given the following input with charset=utf-8
meta tag:
CString u16 = LR"(<!DOCTYPE><html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
<body>ελληνικά 华语 😃</body></html>)";
m_htmledit.SetDocumentHTML(u16)
doesn't show the characters correctly.
Instead, I have to call m_htmledit.SetDocumentHTML(CA2W(CW2A(u16, CP_UTF8), CP_ACP));
I can't figure out why it works like that, or if it does work on all systems.
Minimum example:
#include "afxhtml.h"
...
CHtmlEditCtrl m_htmledit;
...
BOOL CMyDialog::OnInitDialog()
{
CDialogEx::OnInitDialog();
m_htmledit.Create(0, 0, CRect(10, 10, 300, 300), this, 0, 0);
//wait for the control, this is not directly related to the question
CComPtr<IHTMLDocument2> document;
if(m_htmledit.GetDHtmlDocument(&document))
{
CComBSTR ready;
while(document->get_readyState(&ready) == S_OK)
if(wcscmp(ready, L"complete") == 0 || !AfxPumpMessage())
break;
}
//send html data:
CString utf16 = LR"(<!DOCTYPE><html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
<body>ελληνικά 华语 😃</body></html>)";
//m_htmledit.SetDocumentHTML(utf16); <- outputs garbage characters
m_htmledit.SetDocumentHTML(CA2W(CW2A(utf16, CP_UTF8), CP_ACP)); //<- correct output
return TRUE;
}
There is similar issue with UTF-8 input.
m_htmledit_ctrl.SetDocumentHTML(CA2W(utf8, CP_UTF8));
doesn't show the characters correctly.
m_htmledit_ctrl.SetDocumentHTML(CA2W(utf8, CP_ACP));
does work. But using CP_ACP
here is odd.
Example:
CStringA utf8 = u8R"(<!DOCTYPE><html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>
<body>ελληνικά 华语 😃</body></html>)";
m_htmledit.SetDocumentHTML(CA2W(utf8, CP_ACP)); //<= correct output
CHtmlEditCtrl::SetDocumentHTML
uses a class called CStreamOnCString
.
CStreamOnCString
at some point calls
m_strAnsi = m_strStream;
Where m_strAnsi
is a storage buffer, and m_strStream
is CStringW
source. I believe this is an error, because it doesn't copy source to the buffer. Rather it converts it with CW2A(m_strStream, CP_ACP)
This error can be corrected with another CP_ACP
conversion prior to sending data.
Alternatively we can write our own function as follows:
class CMyHtmlEditCtrl : public CHtmlEditCtrl
{
public:
template <class Type>
HRESULT SetDocumentHTML_unicode(CStringT<Type, StrTraitMFC<Type>> html)
{
HRESULT hr = E_NOINTERFACE;
CComPtr<IHTMLDocument2> document;
if(!GetDHtmlDocument(&document))
return hr;
IStream *istream = SHCreateMemStream(
reinterpret_cast<const BYTE*>(html.GetBuffer()), sizeof(Type)*html.GetLength());
if(istream)
{
//CComQIPtr<IPersistStreamInit> psi = document;
CComQIPtr<IPersistStreamInit> psi { document }; //c++20 compliant
if(psi)
hr = psi->Load(istream);
istream->Release();
}
html.ReleaseBuffer();
return hr;
}
};
Now we can call SetDocumentHTML_unicode(utf8_string)
or SetDocumentHTML_unicode(utf16_string)
The accepted answer does not compile in c++20. Here is a version that does (however only for CStringW or CStringA, depending on UNICODE define):
HRESULT CMyHtmlEditCtrl::SetDocumentHTML_unicode(CString html)
{
HRESULT hr = E_NOINTERFACE;
CStreamOnCString stream(html);
::ATL::CComPtr<IHTMLDocument2> spHTMLDocument;
::ATL::CComQIPtr<IPersistStreamInit> spPSI;
if(!GetDHtmlDocument(&spHTMLDocument))
return hr;
IStream *istream = SHCreateMemStream(
reinterpret_cast<const BYTE*>((LPCTSTR)html), sizeof(TCHAR)*html.GetLength());
if(istream)
{
spPSI = spHTMLDocument;
if(spPSI)
hr = spPSI->Load(istream);
istream->Release();
}
return hr;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With