I just built Spark 2 with hive support and deploy it to a cluster with Hortonworks 2.3.4. However I find that this Spark 2.0.3 is slower than the standard spark 1.5.3 that comes with HDP 2.3
When I check explain
it seems that my Spark 2.0.3 is not using tungsten. Do I need to create special build to enable Tungsten?
Spark 1.5.3 Explain
== Physical Plan ==
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
TungstenExchange hashpartitioning(id#2)
TungstenAggregate(key=[id#2], functions=[], output=[id#2])
HiveTableScan [id#2], (MetastoreRelation default, testing, None)
Spark 2.0.3
== Physical Plan ==
*HashAggregate(keys=[id#2481], functions=[])
+- Exchange hashpartitioning(id#2481, 72)
+- *HashAggregate(keys=[id#2481], functions=[])
+- HiveTableScan [id#2481], MetastoreRelation default, testing
It still uses Tungsten, class was renamed: https://github.com/apache/spark/commit/8900c8d8ff1614b5ec5a2ce213832fa13462b4d4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With