Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Scala, how to read a simple CSV file having a header in its first line?

The task is to look for a specific field (by it's number in line) value by a key field value in a simple CSV file (just commas as separators, no field-enclosing quotes, never a comma inside a field), having a header in its first line.

User uynhjl has given an example (but with a different character as a separator):


val src = Source.fromFile("/etc/passwd")
val iter = src.getLines().map(_.split(":"))
// print the uid for Guest
iter.find(_(0) == "Guest") foreach (a => println(a(2)))
// the rest of iter is not processed
src.close()

the question in this case is how to skip a header line from parsing?

like image 575
Ivan Avatar asked Aug 31 '10 23:08

Ivan


People also ask

Is header file as first line CSV?

In the first line of the file, include a header with a list of the column names in the file. This is optional, but strongly recommended; it allows the file to be self-documenting.

Can CSV files have headers?

A header of the CSV file is an array of values assigned to each of the columns. It acts as a row header for the data. Initially, the CSV file is converted to a data frame and then a header is added to the data frame. The contents of the data frame are again stored back into the CSV file.


3 Answers

You can just use drop:

val iter = src.getLines().drop(1).map(_.split(":"))

From the documentation:

def drop (n: Int) : Iterator[A]: Advances this iterator past the first n elements, or the length of the iterator, whichever is smaller.

like image 149
Travis Brown Avatar answered Oct 18 '22 05:10

Travis Brown


Here's a CSV reader in Scala. Yikes.

Alternatively, you can look for a CSV reader in Java, and call that from Scala.

Parsing CSV files properly is not a trivial matter. Escaping quotes, for starters.

like image 16
Robert Harvey Avatar answered Oct 18 '22 03:10

Robert Harvey


First I read the header line using take(1), and then the remaining lines are already in src iterator. This works fine for me.

val src = Source.fromFile(f).getLines

// assuming first line is a header
val headerLine = src.take(1).next

// processing remaining lines
for(l <- src) {
  // split line by comma and process them
  l.split(",").map { c => 
      // your logic here
  }
}
like image 4
tuxdna Avatar answered Oct 18 '22 03:10

tuxdna