When I input some UTF-8 character rather than simple ASCII, I find that clang-format will take the character's length as how many bytes expressed, rather than as the length show in terminal.
For example:
测
is stored in 3 bytes, but it takes 2 ASCII space in vim or other editors.
The expected formatted code should be as follow:
#define test \
/* 测试 */ \
"aa" \
"bb" \
"bb"
But what I really get is as follow:
#define test \
/* 测试 */ \
"aa" \
"bb" \
"bb"
How can I get the expected result with some configuration?
Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.
clang-format supports two ways to provide custom style options: directly specify style configuration in the -style= command line option or use -style=file and put style configuration in the . clang-format or _clang-format file in the project directory.
UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.
UTF-8, which stands for 8-bit Unicode Transformation Format, is an encoding method for Unicode characters. It uses a sequence of at least eight binary digits known as code units.
clang-format already detects UTF-8. It doesn't need to be told to do it. So you're dealing with a bug here that you should report to their bug tracker:
https://bugs.llvm.org/
Make sure you have tested with the latest clang-format version though. 11.0.0 as of right now. You don't want to report a bug from an old version that has been fixed already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With