Consider this program:
#include <stdio.h>
int main() {
printf("%s\n", __FILE__);
return 0;
}
Depending on the name of the file, this program works - or not. The issue I'm facing is that I'd like to print the name of the current file in an encoding-safe way. However, in case the file has funny characters which cannot be represented in the current code page, the compiler yields a warning (rightfully so):
?????????.c(3) : warning C4566: character represented by universal-character-name '\u043F' cannot be represented in the current code page (1252)
How do I tackle this? I'd like to store the string given by __FILE__
in e.g. UTF-16 so that I can properly print it on any other system at runtime (by converting the stored UTF-16 representation to whatever the runtime system uses). To do so, I need to know:
__FILE__
? It seems that, at least on Windows, the current system code page (in my case, Windows-1252) is used - but this is just guessing. Is this true?My real life use case: I have a macro which traces the current program execution, writing the current sourcecode/line number information to a file. It looks like this:
struct LogFile {
// Write message to file. The file should contain the UTF-8 encoded data!
void writeMessage( const std::string &msg );
};
// Global function which returns a pointer to the 'active' log file.
LogFile *activeLogFile();
#define TRACE_BEACON activeLogFile()->write( __FILE__ );
This breaks in case the current source file has a name which contains characters which cannot be represented by the current code page.
Use can use the token pasting operator, like this:
#define WIDEN2(x) L ## x
#define WIDEN(x) WIDEN2(x)
#define WFILE WIDEN(__FILE__)
int main() {
wprintf("%s\n", WFILE);
return 0;
}
__FILE__
will always expand to character string literal, thus in essence it will be compatible to char const*
. This means that a compiler implementation has not much other choice than using the raw byte representation of the source file name as it presents itself at compile time.
Whether or not this is something sensible in the current locale or not doesn't matter, you could have a source file name that contains basically garbage, as long as your run time system and compiler accept it as a valid file name.
If you, as a user, have a different locale with different encoding than is used in your file system, you will see a lot of ???? or alike.
But if both your locales agree upon the encoding, a plain printf
should suffice and your terminal (or whatever you use to look at the output) should be able to print the characters correctly.
So the short answer is, it will only work if your system is consistent w.r.t encoding. Otherwise your out of luck, since guessing encodings is a quite difficult task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With