Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read gzip'd file in Scala

Tags:

scala

In Java, I'd wrap a GZIPInputStream over a FileInputStream and be done. How is the equivalent done in Scala?

Source.fromFile("a.csv.gz").... 

fromFile returns a BufferedSource, which really wants to view the world as a collection of lines.

Is there no more elegant way than this?

Source.fromInputStream(new GZIPInputStream(new BufferedInputStream(new FileInputStream("a.csv.gz")))) 
like image 316
François Beausoleil Avatar asked Feb 18 '13 15:02

François Beausoleil


People also ask

Can Spark read gzip files?

Spark supports text files, SequenceFiles, and any other Hadoop InputFormat. In Spark, support for gzip input files should work the same as it does in Hadoop.

How do I read a DAT gz file in Pyspark?

Spark document clearly specify that you can read gz file automatically: All of Spark's file-based input methods, including textFile, support running on directories, compressed files, and wildcards as well. For example, you can use textFile("/my/directory"), textFile("/my/directory/. txt"), and textFile("/my/directory/.

What is GZIP input stream?

GZIPInputStream(InputStream in) Creates a new input stream with a default buffer size. GZIPInputStream(InputStream in, int size) Creates a new input stream with the specified buffer size.


1 Answers

If you want to use Source and not do everything the Java way, then yes, you'll have to add one more layer of wrapping to what you were doing in Java. Source takes InputStreams but can give you Readers, which prevents you from using Source twice.

Scala is pretty good at making you never have to do more work than in Java, but especially with I/O, you often have to fall back to Java classes. (You can always define your own shortcuts, of course:

def gis(s: String) = new GZIPInputStream(new BufferedInputStream(new FileInputStream(s))) 

is barely longer than what you've typed already, and now you can reuse it.)

like image 169
Rex Kerr Avatar answered Sep 19 '22 13:09

Rex Kerr