Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

uncompress and read gzip file in scala

In Scala, how does one uncompress the text contained in file.gz so that it can be processed? I would be happy with either having the contents of the file stored in a variable, or saving it as a local file so that it can be read in by the program after.

Specifically, I am using Scalding to process compressed log data, but Scalding does not define a way to read them in FileSource.scala.

like image 630
EthanP Avatar asked Jul 02 '13 22:07

EthanP


People also ask

Can Spark read gzip files?

While a text file in GZip, BZip2, and other supported compression formats can be configured to be automatically decompressed in Apache Spark as long as it has the right file extension, you must perform additional steps to read zip files.

How extract gzip compressed data in Linux?

You can unzip GZ files in Linux by adding the -d flag to the Gzip/Gunzip command. All the same flags we used above can be applied. The GZ file will be removed by default after we uncompressed it unless we use the -k flag. Below we will unzip the GZ files we compressed in the same directory.

What does gzip open return?

open() This function opens a gzip-compressed file in binary or text mode and returns a file like object, which may be physical file, a string or byte object.


1 Answers

Here's my version:

import java.io.BufferedReader
import java.io.InputStreamReader
import java.util.zip.GZIPInputStream
import java.io.FileInputStream

class BufferedReaderIterator(reader: BufferedReader) extends Iterator[String] {
  override def hasNext() = reader.ready
  override def next() = reader.readLine()
}

object GzFileIterator {
  def apply(file: java.io.File, encoding: String) = {
    new BufferedReaderIterator(
      new BufferedReader(
        new InputStreamReader(
          new GZIPInputStream(
            new FileInputStream(file)), encoding)))
  }
}

Then do:

val iterator = GzFileIterator(new java.io.File("test.txt.gz"), "UTF-8")
iterator.foreach(println)
like image 109
dhg Avatar answered Sep 20 '22 07:09

dhg