Does ifstream support UTF8?

Question

I am wondering in c++, how can we support UTF8 encoding? I think c++ only support char and w_char, but I am wondering how to support UTF-8?

Tom · Accepted Answer

UTF-8 is supported just fine; UTF-8 uses eight-bit symbols to represent characters, with each character having one or more symbols. The standard guarantees that char will be at least eight bits, so every conforming C++ implementation can read, write and process UTF-8 characters. Since 7-bit ASCII is a strict subset of UTF-8, conversion between char strings and UTF-8 is also not a problem.

What is a problem is converting between other encodings (code pages such as Latin-1 or other Unicode encodings such as UTF-16, UCS-2, UTF-32 and UCS-4) and UTF-8. Here's a rough outline of the situation:

C++98 provided the wchar_t type and allowed wide-string literals in the form L"XXX" but left most of the details implementation-defined. So VC++ treats wchar_t as 16-bit and encodes wide-string literals as UTF-16; GCC treats wchar_t as 32-bit and encodes wide-string literals as UTF-32.
C++11 provides some extra types, char16_t and char32_t, as well as 16- and 32-bit literals as u"XXX" and U"XXX". These, however, are not yet supported by VC++ (GCC has them).
Conversion between encodings is supported by the codecvt template. This was added in C++98 but support has been spotty, to say the least. Today, VC++ seems to have reasonable support but GCC's support is lacking.

Does ifstream support UTF8?

Tags:

c++

c++11

Adam Lee

1 Answers

Tom

Recent Activity

Donate For Us

Does ifstream support UTF8?

Tags:

c++

c++11

Adam Lee

1 Answers

Tom

Related questions

Recent Activity

Donate For Us