Given a proto file:
syntax = "proto3";
package hello;
message TopGreeting {
    NestedGreeting greeting = 1;
}
message NestedGreeting {
    Greeting greeting = 1;
}
message Greeting {
    string message = 1;
}
and the code:
public class Main {
    public static void main(String[] args) {
        System.out.printf("From top: %s%n", newGreeting("오늘은 무슨 요일입니까?"));
        System.out.printf("Directly: %s%n", "오늘은 무슨 요일입니까?");
        System.out.printf("ByteString: %s", newGreeting("오늘은 무슨 요일입니까?").toByteString().toStringUtf8());
    }
    private static Hello.TopGreeting newGreeting(String message) {
        Hello.Greeting greeting = Hello.Greeting.newBuilder()
                .setMessage(message)
                .build();
        Hello.NestedGreeting nestedGreeting = Hello.NestedGreeting.newBuilder()
                .setGreeting(greeting)
                .build();
        return Hello.TopGreeting.newBuilder()
                .setGreeting(nestedGreeting)
                .build();
    }
}
Output
From top: greeting {
  greeting {
    message: "\354\230\244\353\212\230\354\235\200 \353\254\264\354\212\250 \354\232\224\354\235\274\354\236\205\353\213\210\352\271\214?"
  }
}
Directly: 오늘은 무슨 요일입니까?
ByteString: 
%
#
!오늘은 무슨 요일입니까?
How do I print the message in a human-readable way? As you can see, converting to ByteString prints the UTF-8 characters alright, but also prints some other garbage % and #.
Answering my own question, I solved this issue by digging through Protobuf source code.
System.out.println(TextFormat.printer().escapingNonAscii(false).printToString(greeting))
Output:
greeting {
  greeting {
    message: "오늘은 무슨 요일입니까?"
  }
}
toString uses the same mechanism but with escapingNonAscii(true) (default when omitted).
Also see this answer for how to convert Octal sequences to UTF-8 characters in case you don't have access to the source code, only logs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With