I have a use-case in which I have to decode the queryParameter of the URI and do the thing(Out of scope of this question).
Suppose I have a URI and I have to decode it. Now I know that presently all the %20 will be converted to space and while creating the URI space should be represented by %20 but there could be a case where I might get the URI with % as space. Therefore, I want to convert the % to space in order to maintain the backward compatibility. There is a note at the end which will help in understanding the question.
I tried replaceall() %
with %20
but then again the %20
will become %2020
and many other exceptions are there.
This is needed for reading UPI URIs, As per official documents from NPCI:
Note: Considering that the current PSP apps are developed to read “%” as space (“ ”), the Bank PSP should support both “%” and “%20”, until such time the ecosystem is aligned to the revision. Hence, backward compatibility should be ensured.
EDIT 1 Based on pshemo comment -
I have tried
str.replaceAll("%(?![0-9a-fA-F])","%20")
A case which is not satisfying the above regex is "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR"
the output is pn -> Prakash"some othercharacter"mar
Probably is not the answer that you want, but this may help:
public class Test {
public static void main(String... a) {
try {
//
String u = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
System.out.println(decode(u));
} catch (Exception e) {
e.printStackTrace();
}
}
private static String decode(String in) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
char c = in.charAt(i);
if (c == '%') {
int decoded = Integer.parseInt(in.substring(i + 1, i + 3), 16);
if (decoded >= 32 && decoded <= 126) { //Possible valid char
sb.append((char) decoded);
i += 2;
} else { //not a valid char... maybe a space
sb.append(" ");
}
} else if (c == '+') {
sb.append(" ");
} else {
sb.append(c);
}
}
return sb.toString();
}
}
There are many possibilities, so probably you will need a "custom" solution. The above code cover some cases.
Interesting problem. You can't replace the %
to a space reliably as you saw yourself already. You need additional information about what will be transported via the uri and then narrow down to what must be replaced and what not, e.g.
%ZTest -> a space for sure
%Abababtest -> is it a space? probably... but we need to be sure that no strange characters or sequences are allowed
%23th%Affleck%20Street -> space? hex? what is what?
You need some more information to solve that issue reliably, like:
%
as spaces? (so you may transform only them)%20
is in the URI, can we assume that no %
will be a space then? or is it possible that both appear as space in the URI?With that additional information it should be easier to solve the issue.
Here is a solution nonetheless that might get you in the right direction (but please consider the warnings at the bottom!):
Pattern HEX_PATTERN = Pattern.compile("(?i)%([A-F0-9]{2})?");
String CHARSET = "utf-8";
String ENCODED_SPACE = "%20";
String ALLOWED_SYMBOLS = "\\p{L}|\\s|@";
String semiDecode(String uri) throws UnsupportedEncodingException {
Matcher m = HEX_PATTERN.matcher(uri);
StringBuffer semiDecoded = new StringBuffer();
while (m.find()) {
String match = m.group();
String hexString = m.group(1);
String replacementString = match;
if (hexString == null) {
replacementString = ENCODED_SPACE;
} else {
// alternatively to the following just check whether the hex value is in an allowed range...
// you may want to lookup https://en.wikipedia.org/wiki/List_of_Unicode_characters for this
String decodedSymbol = URLDecoder.decode(match, CHARSET);
if (!decodedSymbol.matches(ALLOWED_SYMBOLS)) {
replacementString = ENCODED_SPACE + hexString;
}
}
m.appendReplacement(semiDecoded, replacementString);
}
m.appendTail(semiDecoded);
return semiDecoded.toString();
}
Sample usage:
String uri = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
String semiDecoded = semiDecode(uri);
System.out.println("Input: " + uri);
System.out.println("Semi-decoded: " + semiDecoded);
System.out.println("Completely decoded query: " + new URI(semiDecoded).getQuery());
which will print:
Input: upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR
Semi-decoded: upi://pay?pa=praksh%40kmbl&pn=Prakash%20Abmar&cu=INR
Completely decoded query: pa=praksh@kmbl&pn=Prakash Abmar&cu=INR
Warnings... some things to keep in mind:
%##%##
or %##%##%##
for single characters will not be decoded anymore)ALLOWED_SYMBOLS
; for now it accepts any letter, any whitespace and @
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With