Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode quotable chars (from quotable to a char)?

Tags:

java

encoding

I have a text with quoted-printables. Here is an example of such a text (from a wikipedia article):

If you believe that truth=3Dbeauty, then surely=20=
mathematics is the most beautiful branch of philosophy.

I am looking for a Java class, which decode the encoded form to chars, e.g., =20 to a space.

UPDATE: Thanks to The Elite Gentleman, I know that I need to use QuotedPrintableCodec:

import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;

public class QuotedPrintableCodecTest { 
private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws DecoderException
    {
        QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );           
    }
}   

However I keep getting the following exception:

org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
    at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
    at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)

What am I doing wrong?

UPDATE 2: I have found this question @ SO and learn about MimeUtility:

import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;

public class QuotedPrintableCodecTest {
    private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws MessagingException, IOException  
    {
        InputStream is = new ByteArrayInputStream(TXT.getBytes());

            BufferedReader br = new BufferedReader ( new InputStreamReader(  MimeUtility.decode(is, "quoted-printable") ));         
            StringWriter writer = new StringWriter(); 

            String line;
            while( (line = br.readLine() ) != null )
            {
                writer.append(line);
            }
            System.out.println("INPUT:  "  + TXT);
            System.out.println("OUTPUT: " +  writer.toString() );       
    }
    }

However the output still is not perfect, it contains '=' :

INPUT:  If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.

Now what am I doing wrong?

like image 773
Skarab Avatar asked Sep 05 '11 09:09

Skarab


1 Answers

Apache Commons Codec QuotedPrintableCodec class does is the implementation of the RFC 1521 Quoted-Printable section.


Update, Your quoted-printable string is wrong, as the example on Wikipedia uses Soft-line breaks.

Soft-line breaks:

Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
      that encoded lines be no more than 76 characters long. If longer
      lines are to be encoded with the Quoted-Printable encoding, 'soft'
      line breaks must be used. An equal sign as the last character on a
      encoded line indicates such a non-significant ('soft') line break
      in the encoded text. Thus if the "raw" form of the line is a
      single unencoded line that says:

          Now's the time for all folk to come to the aid of
          their country.

      This can be represented, in the Quoted-Printable encoding, as

          Now's the time =
          for all folk to come=
           to the aid of their country.

      This provides a mechanism with which long lines are encoded in
      such a way as to be restored by the user agent.  The 76 character
      limit does not count the trailing CRLF, but counts all other
      characters, including any equal signs.

So your text should be made as follows:

private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";

The Javadoc clearly states:

Rules #3, #4, and #5 of the quoted-printable spec are not implemented yet because the complete quoted-printable spec does not lend itself well into the byte[] oriented codec framework. Complete the codec once the steamable codec framework is ready. The motivation behind providing the codec in a partial form is that it can already come in handy for those applications that do not require quoted-printable line formatting (rules #3, #4, #5), for instance Q codec.

And there is a bug logged for Apache QuotedPrintableCodec as it doesn't support the soft-line breaks.

like image 90
Buhake Sindi Avatar answered Oct 23 '22 04:10

Buhake Sindi