I have been banging my head against the wall on this problem. I've read similar posts & articles; most suggest to set URIEncoding to UTF-8 in Tomcat's server.xml file, but that doesn't seem to make a difference here.
I have a ReSTful web service deployed to a test environment where it is hosted on Tomcat 7. Tomcat is configured to use Java 6, although Java 7 is installed on the machine as well. When running basic authentication tests against the service hosted there, login fails and I receive a response of HTTP status 401 when the original credentials contain Unicode characters. Basic auth works fine when the credentials contain only ASCII. I can also login without using basic auth at all - my service supports custom login headers and RFC 2047. Using that approach, it doesn't matter whether the credentials contain Unicode or not, logging in is not a problem.
Specifically the "problem" appears to be that the username is being UTF-8 encoded twice. There is a bug in my logger (separate issue) where the log files are ANSI-encoded. When you convert the log files to UTF-8, the characters will appear correctly. But in this case, the problematic username is much longer than it should be, and when the file is converted to UTF-8 it then appears like it should have in the first place (before being converted). For example:
The real kicker here is that I have my own instance of Tomcat 7 (Java 6) running locally, and I cannot reproduce the problem against it. I've compared the conf directories of the two Tomcats and they appear to be the same. I can't figure out why basic auth is working in one environment and not the other. I'm running the tests from my machine so it can't be due to a discrepancy in the way I'm testing it (JUnit/JSystem).
Here's what I know:
The following articles are very interesting to me, because they suggest the possibility of combining RFC 2047 and basic auth together. I didn't think that would be necessary because the basic auth string itself contains only ASCII (as it is base-64 encoded). Even if so, why would such a thing be required on one Tomcat server and not another? I feel like pursuing this combination approach isn't addressing the root problem, which is what's really driving me mad!
Thanks in advance for suggestions on things to try or double-check. The test environment is somewhat limited to me - I can only "play with it" during off-hours, so I apologize in advance if I don't respond promptly.
From the data you provided, it actually seems like the UTF-8 data is getting converted to an ASCII encoding instead of being doubly UTF-8 encoded.
As far as the actual issue, unfortunately basic authentication doesn't provide any sort of way to transmit the charset of the undecoded username and password. Because of this, your main options are to assume and manually specify a charset, use the default charset from your environment, or determine a custom way to provide the charset (e.g. another header). Each of the options kind of depends on how much control you have over the environment and on the client/server ends of the communication and if you want all calls to use the same charset.
Based on one server behaving correctly and the other not, I'm assuming that decoding is currently using the default charset from the environment. You're correct that the encoded string will only contain ASCII (so you're probably not seeing an issue transmitting the encoded value), so the data is probably being lost during (or after) the decoding process. Depending on which library you're choosing, it probably either produces a byte array or a String, so be sure to check that you are providing the charset when creating a String from the byte array (e.g. new String(decodedData, someCharset)) or see if there's a way to provide it to the library (if it produces a String).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With