Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to parse as integer

Alright...I have this .txt file (UTF-8)

4661,SOMETHING,3858884120607,24,24.09
4659,SOMETHING1,3858884120621,24,15.95
4660,SOMETHING2,3858884120614,24,19.58

And this code

FileInputStream fis = new FileInputStream(new File(someTextFile.txt));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);

int i = 0;
String line;
while((line = in.readLine()) != null) {
Pattern p = Pattern.compile(",");
String[] article = p.split(line);

// I don't know why but when a first line starts with
// an integer - article[0] (which in .txt file is 4661)
// becomes someWeirdCharacter4661 so I need to trim it
// *weird character is like |=>|

if (i == 0) {
    StringBuffer articleCode = new StringBuffer(article[0]);
    articleCode.deleteCharAt(0);
    article[0] = articleCode.toString();
}

SomeArticle**.addOrChange(mContext, Integer.parseInt(article[0]), article[1], article[2], Integer.parseInt(article[3]), Double.parseDouble(article[4]));

i++;
}

On emulator it's fine but on real device (HTC Desire) I get this (weird) error:

E/AndroidRuntime(16422): java.lang.NumberFormatException: unable to parse '4661' as integer

What's the problem?

** it's just some my class which needs those parameters as input (context,int,string,string,int,double)

like image 337
svenkapudija Avatar asked Jan 04 '11 22:01

svenkapudija


3 Answers

It could that your file is not UTF8 or something along those lines.

However if you want to hack a fix because you are not interested in the problem just a solution :) then strip out anything that isn't a digit or decimal point.

String[] article = p.split(line);
Integer i = Integer.parseInt(article[0].replaceAll("[^0-9.]",""));

The regular expression isn't perfect (it would affect ...999.... for example) but it will do for you.

EDIT:

I did not read the question properly it seems. If it is only at the start of the file then it is very likely that what you have is a byte order mark, which is used to tell you if the file is unicode and also in UTF16/32 whether it is is little endian or big endian. You don't need tend to see it used very often.

http://unicode.org/faq/utf_bom.html#bom10

like image 81
sksamuel Avatar answered Nov 05 '22 02:11

sksamuel


I was going to add this as a comment but decided to include an image as well. It seems the problem is not that the file isn't UTF-8 but in fact the opposite is true - it seems it IS UTF-8 but it isn't being read correctly.

The image is from a hex editor looking at a UTF-8 file I created containing the first line. Note the 3 characters preceding 4661...

alt text

If I save the file in ANSI format, those characters aren't there.

like image 45
Squonk Avatar answered Nov 05 '22 02:11

Squonk


You can use Notepad++, open your text file, choose menu Encoding-->"Encoding in UTF-8 without BOM" and save with this option. The encoded bytes (EF BB BF) will be removed, so your code can parse string to integer without any problem.

Hope this help.

like image 1
ThanhHH Avatar answered Nov 05 '22 03:11

ThanhHH