downloaded spark 1.5.0 pre-built and run via pyspark this simple code
from pyspark.sql import Row
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect
Yields error:
15/09/30 06:48:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so do
es not have its own datastore table.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\bigdata\spark-1.5\spark-1.5.0\python\pyspark\sql\context.py", line 408, in createDataFrame
jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
File "c:\bigdata\spark-1.5\spark-1.5.0\python\pyspark\sql\context.py", line 660, in _ssql_ctx
"build/sbt assembly", e)
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred
while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o28))
so tried to compile it myself
c:\bigdata\spark-1.5\spark-1.5.0>.\build\apache-maven-3.3.3\bin\mvn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests -Phive -Phive-t
hriftserver clean package
but still get the same error on the compiled version.
Any suggestion?
add these line after importing the row
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext( 'local', 'pyspark')
sqlContext = SQLContext(sc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With