Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a file from HDFS and assign the contents to string

Tags:

scala

hadoop

hdfs

In Scala, How to read a file in HDFS and assign the contents to a variable. I know how to read a file and I am able to print it. But If I try assign the content to a string, It giving output as Unit(). Below is the codes I tried.

 val dfs = org.apache.hadoop.fs.FileSystem.get(config);
 val snapshot_file = "/path/to/file/test.txt"
val stream = dfs.open(new Path(snapshot_file))
 def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
readLines.takeWhile(_ != null).foreach(line => println(line))

The above code printing the output properly. But If I tried assign the output to a string, I am getting correct output.

val snapshot_id = readLines.takeWhile(_ != null).foreach(line => println(line))
snapshot_id: Unit = ()

what is the correct way to assign the contents to a variable ?

like image 260
user2731629 Avatar asked Feb 05 '23 06:02

user2731629


1 Answers

You need to use mkString. Since println returns Unit() which gets stored to your variable if you call println on you stream

val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://namenode:port/"), new org.apache.hadoop.conf.Configuration()) 
val path = new org.apache.hadoop.fs.Path("/user/cloudera/file.txt")
val stream = hdfs.open(path)
def readLines = scala.io.Source.fromInputStream(stream)
val snapshot_id : String = readLines.takeWhile(_ != null).mkString("\n")
like image 84
philantrovert Avatar answered Feb 12 '23 03:02

philantrovert