How to convert this String the surveyÂ’s rules
to UTF-8
in Scala?
I tried these roads but does not work:
scala> val text = "the surveyÂ’s rules"
text: String = the surveyÂ’s rules
scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the surveyÂ’s rules
scala> new String(text.getBytes(),"UTF8")
res21: String = the surveyÂ’s rules
Ok, i'm resolved in this way. Not a converting but a simple reading
implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)
val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))
val reader = CSVReader.open(src.reader())
Valid UTF8 has a specific binary format. If it's a single byte UTF8 character, then it is always of form '0xxxxxxx', where 'x' is any binary digit. If it's a two byte UTF8 character, then it's always of form '110xxxxx10xxxxxx'.
In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...
Note that when you call text.getBytes()
without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding. On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.
To be correct you need to specify exact encoding in getBytes()
method call. For Java 7 and later do this:
import java.nio.charset.StandardCharsets
val bytes = text.getBytes(StandardCharsets.UTF_8)
For Java 6 do this:
import java.nio.charset.Charset
val bytes = text.getBytes(Charset.forName("UTF-8"))
Then bytes
will contain UTF-8-encoded text.
Just set the JVM's file.encoding
parameter to UTF-8
as follows:
-Dfile.encoding=UTF-8
It makes sure that UTF-8
is the default encoding.
Using scala
it could be scala -Dfile.encoding=UTF-8
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With