Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

thrift character encoding, perl to java

I have a complex situation that I'm trying to deal with involving character encoding.

I have a perl program which is communicating with a java endpoint via thrift, the java is then using the data to make a request to a legacy php service. It's ugly, but part of a migration plan so needs to work for a short while.

In perl a thrift object is created where some of the fields of the thrift object are json encoded strings.

The problem is that when perl makes the request to java, one of the strings is as follows (this is from data:dumper and is subsequently json encoded and added to thrift):

'offer_message' => "<&lt;>&gt;
&&amp;
\x{c3}\x{82}\x{c2}\x{a9}&copy;
<script>alert(\"XSS\");</script>
https://url.com/imghp?hl=uk",

However, when this data is received on the java side the sequence \x{c3}\x{82}\x{c2}\x{a9} has been converted so in java we receive the following:

<&lt;>&gt;\\n&&amp;\\n���©&copy;\\n<script>alert(\"XSS\");</script>\\nhttps://www.google.com.ua/imghp?hl=uk

The problem is that if I pass the second string to the legacy php program, it fails, if I pass the string taken from the dump of the perl hash, it succeeds. So my assumption is that I need to convert the received string to another encoding (correct me if I'm wrong, I'm not sure that this is the right solution).

I've tried taking the parameters received in java and converting them to every encoding I can think of, however it doesn't work. So for example:

byte[] utf8 = templateParams.getBytes("UTF8");
normallisedTemplateParams = new String(utf8, "UTF8");

I've been varying the encoding schemes in the hope I find something that works.

What is the correct way to solve this? For a short time this messy solution is my only option while other re-engineering is happening.

like image 531
mark Avatar asked Jul 21 '16 13:07

mark


1 Answers

The problem in the end difficult to diagnose but simple to resolve. It turned out that the package I was using to convert in Java was using java's default encoding of UTF-16. I had to modify the package and force it to use UTF-8. After that, everything worked.

like image 61
mark Avatar answered Oct 16 '22 16:10

mark