Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read from zipped xml files in Scala code?

How do I access XML data files directly from a zipped file in my Scala program? Are there any direct ways to programmatically unzip and read contents in my Scala code?

like image 722
Kamesh Rao Yeduvakula Avatar asked Mar 01 '11 10:03

Kamesh Rao Yeduvakula


People also ask

Can Spark read ZIP files?

While a text file in GZip, BZip2, and other supported compression formats can be configured to be automatically decompressed in Apache Spark as long as it has the right file extension, you must perform additional steps to read zip files. The following notebooks show how to read zip files.

Can XML files be zipped?

❓ How can I convert XML to ZIP? First, you need to add a file for conversion: drag & drop your XML file or click inside the white area for choose a file. Then click the "Convert" button. When XML to ZIP conversion is completed, you can download your ZIP file.

How can I read the content of a Zip file without unzipping it in Java?

Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.


1 Answers

Here are a couple of ways of doing it in 2.8.1:

cat > root.xml << EOF
<ROOT>
<id>123</id>
</ROOT>
EOF
zip root root.xml

and then in the REPL:

val rootzip = new java.util.zip.ZipFile("root.zip")
import collection.JavaConverters._
val entries = rootzip.entries.asScala
entries foreach { e =>
    val x = scala.xml.XML.load(rootzip.getInputStream(e))
    println(x)
}

or something like:

val rootzip = new java.util.zip.ZipFile("root.zip")
import scala.collection.JavaConversions._
rootzip.entries.
  filter (_.getName.endsWith(".xml")).
  foreach { e => println(scala.xml.XML.load(rootzip.getInputStream(e))) }
like image 196
Terje Sten Bjerkseth Avatar answered Nov 02 '22 22:11

Terje Sten Bjerkseth