The raw data looks like the following: <pre class="prettyprint"><code>YAPM1,20100901,23:36:01.563,Quote,,,,,,,4563,,,,,, YAPM1,20100901,23:36:03.745,Quote,,,,,4537,,,,,,,, </code></pre> The first row has extra empty columns. I parse the data as follows: <pre class="prettyprint"><code>val tokens = List.fromString(line, ',') </code></pre> The result: <pre class="prettyprint"><code>List(YAPM1, 20100901, 23:36:01.563, Quote, 4563) List(YAPM1, 20100901, 23:36:03.745, Quote, 4537) </code></pre> At the moment there is no way of using the resulting Lists to deduce which rows had the extra columns. How do I do this?

Use string split and pass -1 as the second argument! <pre class="prettyprint"><code>scala> "a,b,c,d,,,,".split(",") res1: Array[java.lang.String] = Array(a, b, c, d) scala> "a,b,c,d,,,,".split(",", -1) res2: Array[java.lang.String] = Array(a, b, c, d, "", "", "", "") </code></pre> FYI List fromString is deprecated in favor of string split.

How do I use Scala to parse CSV data with empty columns?

Tags:

csv

scala

The raw data looks like the following:

YAPM1,20100901,23:36:01.563,Quote,,,,,,,4563,,,,,,
YAPM1,20100901,23:36:03.745,Quote,,,,,4537,,,,,,,,

The first row has extra empty columns. I parse the data as follows:

val tokens = List.fromString(line, ',')

The result:

List(YAPM1, 20100901, 23:36:01.563, Quote, 4563)
List(YAPM1, 20100901, 23:36:03.745, Quote, 4537)

At the moment there is no way of using the resulting Lists to deduce which rows had the extra columns. How do I do this?

362

asked Jul 11 '11 06:07

deltanovember

1 Answers

Use string split and pass -1 as the second argument!

scala> "a,b,c,d,,,,".split(",")
res1: Array[java.lang.String] = Array(a, b, c, d)

scala> "a,b,c,d,,,,".split(",", -1)
res2: Array[java.lang.String] = Array(a, b, c, d, "", "", "", "")

FYI List fromString is deprecated in favor of string split.

answered Nov 03 '22 01:11

Ray Toal

Related questions
                            
                                I can't understand 'RDD.map{ case (A, B) => A } ' in Scala Spark
                            
                                Passing two columns to a udf in scala?
                            
                                group by and picking up first value in spark sql [duplicate]
                            
                                Scala : How to split words using multiple delimeters
                            
                                Writing a factorial tail recursive function in Scala
                            
                                How to join datasets with same columns and select one?
                            
                                List[String] does not have a member traverse from cats
                            
                                Remove all records which are duplicate in spark dataframe
                            
                                Folding a sequence with a binary operation that returns Future
                            
                                How to pattern match in scala 2.13?
                            
                                Illegal base64 character "a" using java.util.Base64 from within Scala
                            
                                Can Scala allow free Type Parameters in arguments (are Scala Type Parameters first class citizens?)?
                            
                                In Scala, can I override a concrete field containing a list and append something to it in the subclass?
                            
                                Named parameters lead to maintenance problems and inferior readability?
                            
                                Programs running aapt in android sdk from shell and from sbt
                            
                                Using Actors to exploit cores
                            
                                How does scala generated byte code drops the checked exception?
                            
                                Hook pattern in Scala
                            
                                Scala way of filling a template?
                            
                                Running a Scala Script with external dependencies

How do I use Scala to parse CSV data with empty columns?

Tags:

csv

scala

deltanovember

People also ask

1 Answers

Ray Toal

Recent Activity

Donate For Us