Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What standard produced hex-encoded characters with an extra "25" at the front?

Tags:

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:

VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS. 

The encoded characters in this sample are as follows:

%253a = %3A = a colon %252c = %2C = a comma %2527 = %27 = an apostrophe (non-curly) 

I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.

So, what standard is this? Is there an official algorithm for converting values like this to other encodings?

like image 598
Will Martin Avatar asked Dec 01 '11 20:12

Will Martin


2 Answers

%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.

If that's the case, it is safe to replace %25 with % (or just URLDecode twice)

like image 137
Eric J. Avatar answered Oct 18 '22 20:10

Eric J.


The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.

It looks like your data got URL encoded twice: , -> %2C -> %252C

Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.

like image 24
Dennis Avatar answered Oct 18 '22 19:10

Dennis