The raw data looks like the following:
YAPM1,20100901,23:36:01.563,Quote,,,,,,,4563,,,,,,
YAPM1,20100901,23:36:03.745,Quote,,,,,4537,,,,,,,,
The first row has extra empty columns. I parse the data as follows:
val tokens = List.fromString(line, ',')
The result:
List(YAPM1, 20100901, 23:36:01.563, Quote, 4563)
List(YAPM1, 20100901, 23:36:03.745, Quote, 4537)
At the moment there is no way of using the resulting Lists to deduce which rows had the extra columns. How do I do this?
Programs store CSV files as simple text characters; a comma separates each data element, such as a name, phone number or dollar amount, from its neighbors. Because of CSV's simple format, you can parse these files with practically any programming language.
Use string split and pass -1 as the second argument!
scala> "a,b,c,d,,,,".split(",")
res1: Array[java.lang.String] = Array(a, b, c, d)
scala> "a,b,c,d,,,,".split(",", -1)
res2: Array[java.lang.String] = Array(a, b, c, d, "", "", "", "")
FYI List fromString is deprecated in favor of string split.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With