Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding the encoded Pound symbol in java

Tags:

java

we are using the external service to get the data in a CSV format. we are trying to write the data to response so that the csv can be downloadable to client. unfortunately, we are getting the data in the below format.

Amount inc. VAT      Balance
£112.83             £0.0
£97.55              £0.0
£15.28              £0.0

we are unable to decode the content. Is there a way to decode £ and display £ in java.

Is there any String Utils available to decode the strings.

like image 581
Anil Kumar C Avatar asked Oct 16 '12 07:10

Anil Kumar C


Video Answer


2 Answers

The file seems to be encoded in UTF-8. You should read it as UTF-8.

If you are using java.io.FileReader and company, you should open a FileInputStream and use an InputStreamReader instead:

// Before: Reader in = new FileReader(file)
Reader in = new InputStreamReader(new FileInputStream(file), "UTF-8");

If you are using some other method for reading the file (an external or internal class library perhaps?), check in its documentation if it allows specifying the text encoding used to read the file.

Update: If you already have a String of mojibake like £97.55 and cannot fix the way it is read, one way of recoding is by converting the string back into bytes and re-interpreting the bytes as UTF-8. This process does not require any external "StringUtils" or codec library; the Java standard API is powerful enough:

String input = ...obtain from somewhere...;
String output = new String(input.getBytes(/*use platform default*/), "UTF-8");
like image 51
Joni Avatar answered Nov 01 '22 03:11

Joni


Problem: when we use the getBytes() over string, it tries to decode using the default encoder. once the String is encoded, decoding may not work well if we use the default decoders.

Solution: One StringUtils of apache will help us in decoding these characters while writing back to the response. This class is available in org.apache.commons.codec.binary package.

String CSVContent = "/* CSV data */";
/**
 *  Decode the bytes using UTF8.  
 */
String decodedStr = StringUtils.newStringUtf8(CSVContent.getBytes("UTF-8"));
/**
 *  Convert the decoded string to Byte array to write to the stream  
 */
Byte [] content = StringUtils.getBytesIso8859_1(decodedStr);

Maven 2.0 dependency.

<dependency>
     <groupId>commons-codec</groupId>
     <artifactId>commons-codec</artifactId>
     <version>1.6</version>
</dependency>

Solution: Two

As per @Joni, Better solution with the standard API:

content = CSVContent.getBytes("ISO-8859-1");
like image 39
Anil Kumar C Avatar answered Nov 01 '22 03:11

Anil Kumar C