Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala.MatchError: [Ljava.lang.String; (of class [Ljava.lang.String;)

When I want to read the file , My file format is : 12334 this:23,word:21,teacher:23

   val fp = "/user/user_id.txt"
   sc.textFile(fp).map { s =>
   val Array(did, info_s) = s.split("\t")
   val info = info_s.split(",").map { kv =>
      val Array(k, v) = kv.split(":")
      (k, v.toDouble)
    }.toSeq
    (did, info)
  }

this scala error appeared.How this happended?

scala.MatchError: [Ljava.lang.String;@51443799 (of class [Ljava.lang.String;)
at com.test.news.IO$$anonfun$1.apply(App.scala:58)
at com.test.news.IO$$anonfun$1.apply(App.scala:57)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

The line 58 is val Array(did, info_s) = s.split("\t"),can't I use this way? I am confused~ help!

like image 917
Hatter Bush Avatar asked Dec 12 '16 02:12

Hatter Bush


2 Answers

Your syntax on line 5 (val Array(k, v) = ...) is using the Array unapply method. You can get a match error if the number of bindings you supply in the extractor is not equal to the length of the array:

scala> val Array(k, v) = "1,2".split(",")
k: String = 1
v: String = 2

scala> val Array(k, v) = "1,2,3".split(",")
scala.MatchError: [Ljava.lang.String;@508dec2b (of class [Ljava.lang.String;)

In your case, this is probably caused by malformed input (multiple : or none). While extractors are useful and concise, their error messages are cryptic, so it's good practice to use slightly more verbose syntaxes if you aren't positive your match is correct (like reading arbitrary text files):

val (k, v) = kv.split(":") match {
  case Array(f1, f2) => (f1, f2)
  case Array(elems) => fatal("found invalid K/V pair: expected 2 elements, found ${elems.length}")
}
like image 168
Tim Avatar answered Nov 16 '22 20:11

Tim


There is a syntax sugar for unpacking a pair in scala as:

val (id, info) = ("123", "word:123")

But this won't work for an Array returned by split() if the elements in the array don't match the number of parameters passed, just use a variable to capture the result and then use index to access the value in the Array:

sc.textFile("user_id.txt").map{ line =>
    val fields = line.split("\t")
    val info = fields(1).split(",").map { kv =>
        val pairs = kv.split(":")
        (pairs(0), pairs(1).toDouble)
    }.toSeq
    (fields(0), info)
}.collect()

# Array[(String, Seq[(String, Double)])] = Array((12334,WrappedArray((this,23.0), (word,21.0), (teacher,23.0))))

Obviously, I am not aware of the unapply method of Array when I had the above method, but I found it appealing to use the unapply method, and here is an alternative to follow the same philosophy as the above method by unpacking the array, and only take the first two fields of each line.

Which essentially follows this line, use _* to capture unwanted elements in the Array:

val Array(k, v, _*) = Array(1, 2, 3, 4, 5)
#k: Int = 1
#v: Int = 2

And the above method can rewritten as:

sc.textFile("user_id.txt").map{ line =>
    val Array(id, info_s, _*) = line.split("\t")
    val info = info_s.split(",").map { kv =>
        val Array(key, value, _*) = kv.split(":")
        (key, value.toDouble)
    }.toSeq
    (id, info)
}.collect()

# Array[(String, Seq[(String, Double)])] = Array((12334,WrappedArray((this,23.0), (word,21.0), (teacher,23.0))))
like image 24
Psidom Avatar answered Nov 16 '22 19:11

Psidom