I'm trying to get some webpages using Scala's scala.io.Source object. Getting the iterator works fine but i cant do anything with it without getting an exception:
scala> scala.io.Source.fromURL("http://google.com")
res0: scala.io.BufferedSource = non-empty iterator
scala> scala.io.Source.fromURL("http://google.com").length
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.read(BufferedReader.java:175)
at scala.io.BufferedSource$$anonfun$iter$1$$anonfun$apply$mcI$sp$1.apply$mcI$sp(BufferedSource.scala:38)
at scala.io.Codec.wrap(Codec.scala:64)
at scala.io.BufferedSource$$anonfun$iter$1.apply$mcI$sp(BufferedSource.scala:38)
at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
at scala.collection.Iterator$$anon$14.next(Iterator.scala:150)
at scala.collection.Iterator$$anon$25.hasNext(Iterator.scala:562)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
at scala.io.Source.hasNext(Source.scala:238)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.io.Source.foreach(Source.scala:181)
at scala.collection.TraversableOnce$class.size(TraversableOnce.scala:104)
at scala.io.Source.size(Source.scala:181)
at scala.collection.Iterator$class.length(Iterator.scala:1071)
at scala.io.Source.length(Source.scala:181)
at .<init>(<console>:8)
at .<clinit>(<console>)
at .<init>(<console>:11)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
at java.lang.Thread.run(Thread.java:745)
So as you can see obtaining the buffer works, i can do something with it
scala> scala.io.Source.fromURL("http://google.com").next
res7: Char = <
But it seems I cant iterate over it.
I'm using scala v 2.9.2 but the problem recurs in 2.11.2 as well. Further I'm running
java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
Any help getting this to work would be greatly appreciated
You have an encoding issue here.
The Encoding needed for interpreting the response is latin1, also known as ISO-8859-1.
Use Source.fromURL("url")("encoding") to solve your problem.
Source.fromURL("http://google.com")("ISO-8859-1").mkString
res4: String =
<!doctype html><html itemscop
A little background: When no encoding is given in a http request the standard behaviour is to retun everything encoded in Latin-1.
For in depth info see http://www.ietf.org/rfc/rfc2045.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With