Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode % into space using URLDecoder in java?

I have a use-case in which I have to decode the queryParameter of the URI and do the thing(Out of scope of this question).

Suppose I have a URI and I have to decode it. Now I know that presently all the %20 will be converted to space and while creating the URI space should be represented by %20 but there could be a case where I might get the URI with % as space. Therefore, I want to convert the % to space in order to maintain the backward compatibility. There is a note at the end which will help in understanding the question.

I tried replaceall() % with %20 but then again the %20 will become %2020 and many other exceptions are there.

This is needed for reading UPI URIs, As per official documents from NPCI:

Note: Considering that the current PSP apps are developed to read “%” as space (“ ”), the Bank PSP should support both “%” and “%20”, until such time the ecosystem is aligned to the revision. Hence, backward compatibility should be ensured.

EDIT 1 Based on pshemo comment -

I have tried

str.replaceAll("%(?![0-9a-fA-F])","%20")

A case which is not satisfying the above regex is "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR"

the output is pn -> Prakash"some othercharacter"mar

like image 901
Aman Verma Avatar asked Nov 29 '17 19:11

Aman Verma


2 Answers

Probably is not the answer that you want, but this may help:

public class Test {

    public static void main(String... a) {
        try {
            //
            String u = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
            System.out.println(decode(u));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static String decode(String in) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < in.length(); i++) {
            char c = in.charAt(i);
            if (c == '%') {
                int decoded = Integer.parseInt(in.substring(i + 1, i + 3), 16);
                if (decoded >= 32 && decoded <= 126) { //Possible valid char
                    sb.append((char) decoded);
                    i += 2;
                } else { //not a valid char... maybe a space
                    sb.append(" ");
                }
            } else if (c == '+') {
                sb.append(" ");
            } else {
                sb.append(c);
            }
        }

        return sb.toString();
    }
}

There are many possibilities, so probably you will need a "custom" solution. The above code cover some cases.

like image 169
fhofmann Avatar answered Sep 30 '22 13:09

fhofmann


Interesting problem. You can't replace the % to a space reliably as you saw yourself already. You need additional information about what will be transported via the uri and then narrow down to what must be replaced and what not, e.g.

%ZTest -> a space for sure
%Abababtest -> is it a space? probably... but we need to be sure that no strange characters or sequences are allowed
%23th%Affleck%20Street -> space? hex? what is what?

You need some more information to solve that issue reliably, like:

  1. which are the allowed symbols? or which are the allowed hex-ranges to be decoded?
  2. which query parameters are the ones to contain % as spaces? (so you may transform only them)
  3. do you need to decode cyrillic, arabic, chinese characters too?
  4. if a %20 is in the URI, can we assume that no % will be a space then? or is it possible that both appear as space in the URI?

With that additional information it should be easier to solve the issue.

Here is a solution nonetheless that might get you in the right direction (but please consider the warnings at the bottom!):

Pattern HEX_PATTERN = Pattern.compile("(?i)%([A-F0-9]{2})?");
String CHARSET = "utf-8";
String ENCODED_SPACE = "%20";
String ALLOWED_SYMBOLS = "\\p{L}|\\s|@";

String semiDecode(String uri) throws UnsupportedEncodingException {
    Matcher m = HEX_PATTERN.matcher(uri);
    StringBuffer semiDecoded = new StringBuffer();
    while (m.find()) {
        String match = m.group();
        String hexString = m.group(1);
        String replacementString = match;
        if (hexString == null) {
            replacementString = ENCODED_SPACE;
        } else {
// alternatively to the following just check whether the hex value is in an allowed range... 
// you may want to lookup https://en.wikipedia.org/wiki/List_of_Unicode_characters for this
            String decodedSymbol = URLDecoder.decode(match, CHARSET);
            if (!decodedSymbol.matches(ALLOWED_SYMBOLS)) {
                replacementString = ENCODED_SPACE + hexString;
            }
        }
        m.appendReplacement(semiDecoded, replacementString);
    }
    m.appendTail(semiDecoded);
    return semiDecoded.toString();
}

Sample usage:

String uri = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
String semiDecoded = semiDecode(uri);
System.out.println("Input: " + uri);
System.out.println("Semi-decoded: " + semiDecoded);
System.out.println("Completely decoded query: " + new URI(semiDecoded).getQuery());

which will print:

Input: upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR
Semi-decoded: upi://pay?pa=praksh%40kmbl&pn=Prakash%20Abmar&cu=INR
Completely decoded query: pa=praksh@kmbl&pn=Prakash Abmar&cu=INR

Warnings... some things to keep in mind:

  • this specific implementation does not work with cyrillic, chinese or other letters which take up more then 2 hex values (i.e. %##%## or %##%##%## for single characters will not be decoded anymore)
  • you need to adapt the allowed symbols to your needs (see regex of ALLOWED_SYMBOLS; for now it accepts any letter, any whitespace and @)
  • charset utf-8 was assumed
like image 33
Roland Avatar answered Sep 30 '22 13:09

Roland