I would like to know if there is an easy way to detect if the text on the clipboard is in ISO 8859 or UTF-8 ?
Here is my current code:
COleDataObject obj;
if (obj.AttachClipboard())
{
if (obj.IsDataAvailable(CF_TEXT))
{
HGLOBAL hmem = obj.GetGlobalData(CF_TEXT);
CMemFile sf((BYTE*) ::GlobalLock(hmem),(UINT) ::GlobalSize(hmem));
CString buffer;
LPSTR str = buffer.GetBufferSetLength((int)::GlobalSize(hmem));
sf.Read(str,(UINT) ::GlobalSize(hmem));
::GlobalUnlock(hmem);
//this is my string class
s->SetEncoding(ENCODING_8BIT);
s->SetString(buffer);
}
}
}
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
Within an identifier, you would also want to allow characters >= 0x80, which is the range of UTF-8 continuation bytes. Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.
For your specific use-case, it's very easy to tell. Just scan the file, if you find any NULL ("\0"), it must be UTF-16. JavaScript got to have ASCII chars and they are represented by a leading 0 in UTF-16.
UTF-8 is the universal code page for internationalization and is able to encode the entire Unicode character set. It is used pervasively on the web, and is the default for *nix-based platforms.
Check out the definition of CF_LOCALE at this Microsoft page. It tells you the locale of the text in the clipboard. Better yet, if you use CF_UNICODETEXT instead, Windows will convert to UTF-16 for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With