How do you load an HTML DOM document into Scala? The XML singleton had errors when trying to load the xmlns tags.
import java.net._
import java.io._
import scala.xml._
object NetParse {
import java.net.{URLConnection, URL}
import scala.xml._
def netParse(sUrl: String): Elem = {
var url = new URL(sUrl)
var connect = url.openConnection
XML.load(connect.getInputStream)
}
}
Finally I found a solution! - Requires scala 2.7.7 or higher to work (2.7.0 has a fatal bug): How-to-use-TagSoup-with-Scala-XML
This may help you Processing real world HTML as if it were XML in scala
Try using scala.xml.parsing.XhtmlParser
instead.
I have just tried to use this answer with scala 2.8.1 and ended up using the work from:
http://www.hars.de/2009/01/html-as-xml-in-scala.html
The interesting bit that I needed was:
val parserFactory = new org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl
val parser = parserFactory.newSAXParser()
val source = new org.xml.sax.InputSource("http://www.scala-lang.org")
val adapter = new scala.xml.parsing.NoBindingFactoryAdapter
adapter.loadXML(source, parser)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With