Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an empty DataFrame? Why "ValueError: RDD is empty"?

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

This is my code

df = sqlContext.createDataFrame(sc.emptyRDD(), schema) 

This is the error

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createDataFrame rdd, schema = self._createFromRDD(data, schema, samplingRatio) File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createFromRDD struct = self._inferSchema(rdd, samplingRatio) File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferSchema first = rdd.first() File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first raise ValueError("RDD is empty") ValueError: RDD is empty 
like image 907
user3276768 Avatar asked Jan 06 '16 02:01

user3276768


People also ask

How do you make an empty RDD in PySpark?

Create Empty RDD in PySpark Create an empty RDD by using emptyRDD() of SparkContext for example spark. sparkContext. emptyRDD() . Alternatively you can also get empty RDD by using spark.

How do I know if my RDD is empty?

isEmpty. Returns true if and only if the RDD contains no elements at all. An RDD may be empty even when it has at least 1 partition.


1 Answers

extending Joe Widen's answer, you can actually create the schema with no fields like so:

schema = StructType([]) 

so when you create the DataFrame using that as your schema, you'll end up with a DataFrame[].

>>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema) DataFrame[] >>> empty.schema StructType(List()) 

In Scala, if you choose to use sqlContext.emptyDataFrame and check out the schema, it will return StructType().

scala> val empty = sqlContext.emptyDataFrame empty: org.apache.spark.sql.DataFrame = []  scala> empty.schema res2: org.apache.spark.sql.types.StructType = StructType()     
like image 54
Ton Torres Avatar answered Oct 07 '22 01:10

Ton Torres