When I want to read the file , My file format is : 12334 this:23,word:21,teacher:23
val fp = "/user/user_id.txt"
sc.textFile(fp).map { s =>
val Array(did, info_s) = s.split("\t")
val info = info_s.split(",").map { kv =>
val Array(k, v) = kv.split(":")
(k, v.toDouble)
}.toSeq
(did, info)
}
this scala error appeared.How this happended?
scala.MatchError: [Ljava.lang.String;@51443799 (of class [Ljava.lang.String;)
at com.test.news.IO$$anonfun$1.apply(App.scala:58)
at com.test.news.IO$$anonfun$1.apply(App.scala:57)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The line 58 is val Array(did, info_s) = s.split("\t"),can't I use this way? I am confused~ help!
Your syntax on line 5 (val Array(k, v) = ...
) is using the Array unapply method. You can get a match error if the number of bindings you supply in the extractor is not equal to the length of the array:
scala> val Array(k, v) = "1,2".split(",")
k: String = 1
v: String = 2
scala> val Array(k, v) = "1,2,3".split(",")
scala.MatchError: [Ljava.lang.String;@508dec2b (of class [Ljava.lang.String;)
In your case, this is probably caused by malformed input (multiple :
or none). While extractors are useful and concise, their error messages are cryptic, so it's good practice to use slightly more verbose syntaxes if you aren't positive your match is correct (like reading arbitrary text files):
val (k, v) = kv.split(":") match {
case Array(f1, f2) => (f1, f2)
case Array(elems) => fatal("found invalid K/V pair: expected 2 elements, found ${elems.length}")
}
There is a syntax sugar for unpacking a pair in scala as:
val (id, info) = ("123", "word:123")
But this won't work for an Array returned by split()
if the elements in the array don't match the number of parameters passed, just use a variable to capture the result and then use index to access the value in the Array:
sc.textFile("user_id.txt").map{ line =>
val fields = line.split("\t")
val info = fields(1).split(",").map { kv =>
val pairs = kv.split(":")
(pairs(0), pairs(1).toDouble)
}.toSeq
(fields(0), info)
}.collect()
# Array[(String, Seq[(String, Double)])] = Array((12334,WrappedArray((this,23.0), (word,21.0), (teacher,23.0))))
Obviously, I am not aware of the unapply
method of Array when I had the above method, but I found it appealing to use the unapply
method, and here is an alternative to follow the same philosophy as the above method by unpacking the array, and only take the first two fields of each line.
Which essentially follows this line, use _*
to capture unwanted elements in the Array:
val Array(k, v, _*) = Array(1, 2, 3, 4, 5)
#k: Int = 1
#v: Int = 2
And the above method can rewritten as:
sc.textFile("user_id.txt").map{ line =>
val Array(id, info_s, _*) = line.split("\t")
val info = info_s.split(",").map { kv =>
val Array(key, value, _*) = kv.split(":")
(key, value.toDouble)
}.toSeq
(id, info)
}.collect()
# Array[(String, Seq[(String, Double)])] = Array((12334,WrappedArray((this,23.0), (word,21.0), (teacher,23.0))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With