I have:
val DF1 = sparkSession.sql("select col1,col2,col3 from table");
val tupleList = DF1.select("col1","col2").rdd.map(r => (r(0),r(1))).collect()
tupleList.foreach(x=> x.productIterator.foreach(println))
But I do not get all the tuples in the output. Where is the issue?
col1 col2
AA CCC
AA BBB
DD CCC
AB BBB
Others BBB
GG ALL
EE ALL
Others ALL
ALL BBB
NU FFF
NU Others
Others Others
C FFF
The output I get is:
CCC AA BBB AA Others AA Others DD ALL Others ALL GG ALL ALL
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val df1 = hiveContext.sql("select id, name from class_db.students")
scala> df1.show()
+----+-------+
| id| name|
+----+-------+
|1001| John|
|1002|Michael|
+----+-------+
scala> df1.select("id", "name").rdd.map(x => (x.get(0), x.get(1))).collect()
res3: Array[(Any, Any)] = Array((1001,John), (1002,Michael))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With