Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select query fails on large dataset i sqlcontext

My code is reading data from sqlcontext. The table has 20 million records in it. I want to calculate totalCount in table.

val finalresult = sqlContext.sql(“SELECT movieid,
tagname, occurrence AS eachTagCount, count AS
totalCount FROM result ORDER BY movieid”) 

I want calculate the total count of one column without using groupby and save it in a textfile. .I change my saving file without additional ]

 >val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
import sqlContext._
case class DataClass(UserId: Int, MovieId:Int, Tag: String)
// Create an RDD of DataClass objects and register it as a table.
val Data = sc.textFile("file:///usr/local/spark/dataset/tagupdate").map(_.split(",")).map(p => DataClass(p(0).trim.toInt, p(1).trim.toInt, p(2).trim)).toDF()
Data.registerTempTable("tag")

val orderedId = sqlContext.sql("SELECT MovieId AS Id,Tag FROM tag ORDER BY MovieId")
orderedId.rdd
  .map(_.toSeq.map(_+"").reduce(_+";"+_))
  .saveAsTextFile("/usr/local/spark/dataset/algorithm3/output")
  // orderedId.write.parquet("ordered.parquet")
val eachTagCount =orderedId.groupBy("Tag").count()
//eachTagCount.show()
eachTagCount.rdd
 .map(_.toSeq.map(_+"").reduce(_+";"+_))
 .saveAsTextFile("/usr/local/spark/dataset/algorithm3/output2")

ERROR Executor: Exception in task 0.0 in stage 7.0 (TID 604) java.lang.ArrayIndexOutOfBoundsException: 1 at tags$$anonfun$6.apply(tags.scala:46) at tags$$anonfun$6.apply(tags.scala:46) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)

like image 820
Salm Avatar asked Mar 15 '26 02:03

Salm


1 Answers

Error NumberFormatException is probably thrown in this place:

p(1).trim.toInt

It is thrown because you're trying to parse 10] which is obviously not a valid number.

  • You could try to find that problematic place in your file and just remove additional ].

  • You could also try to catch an error and provide a default value in case there are any problems with parsing:

    import scala.util.Try
    
    Try(p(1).trim.toInt).getOrElse(0) //return 0 in case there is problem with parsing.
    
  • Another thing you could do is to remove characters, which are not digits from the string you're trying to parse:

    //filter out everything which is not a digit
    p(1).filter(_.isDigit).toInt)
    

It might also fail in case everything will be filtered out and an empty string will be left, so it might be a good idea to also wrap it in Try.

like image 115
Krzysztof Atłasik Avatar answered Mar 18 '26 05:03

Krzysztof Atłasik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!