The task is to look for a specific field (by it's number in line) value by a key field value in a simple CSV file (just commas as separators, no field-enclosing quotes, never a comma inside a field), having a header in its first line.
User uynhjl has given an example (but with a different character as a separator):
val src = Source.fromFile("/etc/passwd")
val iter = src.getLines().map(_.split(":"))
// print the uid for Guest
iter.find(_(0) == "Guest") foreach (a => println(a(2)))
// the rest of iter is not processed
src.close()
the question in this case is how to skip a header line from parsing?
In the first line of the file, include a header with a list of the column names in the file. This is optional, but strongly recommended; it allows the file to be self-documenting.
A header of the CSV file is an array of values assigned to each of the columns. It acts as a row header for the data. Initially, the CSV file is converted to a data frame and then a header is added to the data frame. The contents of the data frame are again stored back into the CSV file.
You can just use drop
:
val iter = src.getLines().drop(1).map(_.split(":"))
From the documentation:
def drop (n: Int) : Iterator[A]
: Advances this iterator past the first n elements, or the length of the iterator, whichever is smaller.
Here's a CSV reader in Scala. Yikes.
Alternatively, you can look for a CSV reader in Java, and call that from Scala.
Parsing CSV files properly is not a trivial matter. Escaping quotes, for starters.
First I read the header line using take(1)
, and then the remaining lines are already in src
iterator. This works fine for me.
val src = Source.fromFile(f).getLines
// assuming first line is a header
val headerLine = src.take(1).next
// processing remaining lines
for(l <- src) {
// split line by comma and process them
l.split(",").map { c =>
// your logic here
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With