Alright...I have this .txt file (UTF-8)
4661,SOMETHING,3858884120607,24,24.09
4659,SOMETHING1,3858884120621,24,15.95
4660,SOMETHING2,3858884120614,24,19.58
And this code
FileInputStream fis = new FileInputStream(new File(someTextFile.txt));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);
int i = 0;
String line;
while((line = in.readLine()) != null) {
Pattern p = Pattern.compile(",");
String[] article = p.split(line);
// I don't know why but when a first line starts with
// an integer - article[0] (which in .txt file is 4661)
// becomes someWeirdCharacter4661 so I need to trim it
// *weird character is like |=>|
if (i == 0) {
StringBuffer articleCode = new StringBuffer(article[0]);
articleCode.deleteCharAt(0);
article[0] = articleCode.toString();
}
SomeArticle**.addOrChange(mContext, Integer.parseInt(article[0]), article[1], article[2], Integer.parseInt(article[3]), Double.parseDouble(article[4]));
i++;
}
On emulator it's fine but on real device (HTC Desire) I get this (weird) error:
E/AndroidRuntime(16422): java.lang.NumberFormatException: unable to parse '4661' as integer
What's the problem?
** it's just some my class which needs those parameters as input (context,int,string,string,int,double)
It could that your file is not UTF8 or something along those lines.
However if you want to hack a fix because you are not interested in the problem just a solution :) then strip out anything that isn't a digit or decimal point.
String[] article = p.split(line);
Integer i = Integer.parseInt(article[0].replaceAll("[^0-9.]",""));
The regular expression isn't perfect (it would affect ...999.... for example) but it will do for you.
EDIT:
I did not read the question properly it seems. If it is only at the start of the file then it is very likely that what you have is a byte order mark, which is used to tell you if the file is unicode and also in UTF16/32 whether it is is little endian or big endian. You don't need tend to see it used very often.
http://unicode.org/faq/utf_bom.html#bom10
I was going to add this as a comment but decided to include an image as well. It seems the problem is not that the file isn't UTF-8 but in fact the opposite is true - it seems it IS UTF-8 but it isn't being read correctly.
The image is from a hex editor looking at a UTF-8 file I created containing the first line. Note the 3 characters preceding 4661...
If I save the file in ANSI format, those characters aren't there.
You can use Notepad++, open your text file, choose menu Encoding-->"Encoding in UTF-8 without BOM" and save with this option. The encoded bytes (EF BB BF) will be removed, so your code can parse string to integer without any problem.
Hope this help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With